OMeanVar

class omoment.OMeanVar(mean: Number = nan, var: Number = nan, weight: Number = 0, handling_invalid: HandlingInvalid = HandlingInvalid.Drop)

Bases: OMean

Online estimator of weighted mean and variance.

Represents mean, variance and weight of a part of data. Two OMeanVar objects can be added together to produce correct estimates for overall dataset. Mean, variance and weight are stored using __slots__ to allow for lightweight objects that can be used in large quantities even in pandas DataFrame (however they are still Python objects, not numpy types).

Most methods are fairly permissive, allowing to work on numbers, numpy arrays or pandas DataFrames. By default, invalid values are omitted from data (NaNs, infinities and negative weights). Variance in OMeanVar is based on ddof = 0, in agreement with numpy std method.

Addition of \(\mathrm{OMeanVar}(m_1, v_1, w_1)\) and \(\mathrm{OMeanVar}(m_2, v_2, w_2)\) is calculated as:

\begin{gather*} \delta_m = m_2 - m_1\\ \delta_v = v_2 - v_1\\ w_N = w_1 + w_2\\ r = \frac{w_2}{w_N}\\ m_N = m_1 + \delta_m \frac{w_2}{w_N}\\ v_N = v_1 + \delta_v r + \delta_m^2 r (1 - r) \end{gather*}

Where subscript N denotes the new values produced by the addition.

classmethod compute(x: Number | ndarray | Series, w: Number | ndarray | Series | None = None, handling_invalid: HandlingInvalid = HandlingInvalid.Drop) → OMeanVar: Shortcut for initialization of an empty object and its update based on data.

static get_std_dev(om: OMeanVar): Convenience function to be used as a lambda.

static get_unbiased_std_dev(om: OMeanVar): Convenience function to be used as a lambda.

static get_unbiased_var(om: OMeanVar): Convenience function to be used as a lambda.

static get_var(om: OMeanVar): Convenience function to be used as a lambda.

mean

static of_groupby(data: pd.DataFrame, g: str | List[str], x: str, w: str | None = None, handling_invalid: HandlingInvalid = HandlingInvalid.Drop) → pd.Series[OMean]

Optimized version for calculation of means of large number of groups in data.

Avoids slower groupby -> apply workflow and uses optimized aggregation functions only. The function is about 5x faster on testing dataset with 10,000,000 rows and 100,000 groups.

Parameters:

data – input DataFrame
g – name of column containing group keys; can be also a list of multiple column names
x – name of column with values to calculated mean of
w – name of column with weights (optional)
handling_invalid – How to handle invalid values in calculation [‘drop’, ‘keep’, ‘raise’], default value ‘drop’. Provided either as enum or its string representation.

Returns:

pandas Series indexed by group values g and containing estimated OMeanVar objects

property std_dev: float: Estimate of standard deviation, calculated as \(\sqrt{\mathrm{Var}}\). Based on ddof = 0, the same default as in numpy std method.

property unbiased_std_dev: float: Estimate of unbiased standard deviation based on ddof = 1 (suitable for unweighted data).

property unbiased_var: float: Estimate of unbiased variance based on ddof = 1 (suitable for unweighted data).

Update the moments by adding new values.

Can be either single values or batch of data in numpy arrays. In the latter case, moments are first estimated on the new data and the moments for old and new data are combined. Invalid values and negative weights are omitted by default. The calculated variance assumes zero degrees of freedom, OMeanVar has properties unbiased_var and unbiased_std_dev based on dof = 1.

Parameters:

x – Values to add to the current estimate.
w – Weights for the values. If provided, has to have the same length as x.
handling_invalid – How to handle invalid values in calculation [‘drop’, ‘keep’, ‘raise’], default value ‘drop’. Provided either as enum or its string representation.

Returns:

The same OMeanVar object updated for the new data.

Raises:

ValueError – if raise_if_nans is True and there are invalid values (NaNs, infinities or negative weights) in data.
TypeError – if values x or w have more than one dimension or if they are of different size.

var

weight