OMean

class omoment.OMean(mean: Number = nan, weight: Number = 0, handling_invalid: HandlingInvalid = HandlingInvalid.Drop)

Bases: OBase

Online estimator of weighted mean.

Represents mean and weight of a part of data. Two OMean objects can be added together to produce correct estimates for larger dataset. Mean and weight are stored using __slots__ to allow for lightweight objects that can be used in large quantities even in pandas DataFrames.

Most methods are fairly permissive, allowing to work on numbers, numpy arrays or pandas DataFrames. By default, invalid values are omitted from data (NaNs, infinities and negative weights).

Addition of \(\mathrm{OMean}(m_1, w_1)\) and \(\mathrm{OMean}(m_2, w_2)\) is calculated as:

\begin{gather*} \delta = m_2 - m_1\\ w_N = w_1 + w_2\\ m_N = m_1 + \delta \frac{w_2}{w_N} \end{gather*}

Where subscript N denotes the new values produced by the addition.

classmethod compute(x: Number | ndarray | Series, w: Number | ndarray | Series | None = None, handling_invalid: HandlingInvalid = HandlingInvalid.Drop) → OMean: Shortcut for initialization of an empty object and its update based on data.

static get_mean(om: OMean): Convenience function to be used as a lambda.

static get_weight(om: OMean): Convenience function to be used as a lambda.

mean

classmethod of_frame(data: DataFrame, x: str, w: str | None = None, handling_invalid: HandlingInvalid = HandlingInvalid.Drop) → OMean

Convenience function for calculation OMean of pandas DataFrame.

Parameters:

data – input DataFrame
x – name of column with values to calculated mean of
w – name of column with weights (optional)
handling_invalid – How to handle invalid values in calculation [‘drop’, ‘keep’, ‘raise’], default value ‘drop’. Provided either as enum or its string representation.

Returns:

OMean object

static of_groupby(data: pd.DataFrame, g: str | List[str], x: str, w: str | None = None, handling_invalid: HandlingInvalid = HandlingInvalid.Drop) → pd.Series[OMean]

Optimized version for calculation of means of large number of groups in data.

Avoids slower “groupby -> apply” workflow and uses optimized aggregation functions only. The function is about 4x faster on testing dataset with 10,000,000 rows and 100,000 groups.

Parameters:

data – input DataFrame
g – name of column containing group keys; can be also a list of multiple column names
x – name of column with values to calculated mean of
w – name of column with weights (optional)
handling_invalid – How to handle invalid values in calculation [‘drop’, ‘keep’, ‘raise’], default value ‘drop’. Provided either as enum or its string representation.

Returns:

pandas Series indexed by group values g and containing estimated OMean objects

Update the moments by adding new values.

Can be either single values or batch of data in numpy arrays. In the latter case, moments are first estimated on the new data and the moments for old and new data are combined. Invalid values and negative weights are omitted by default.

Parameters:

x – Values to add to the current estimate.
w – Weights for the values. If provided, has to have the same length as x.
handling_invalid – How to handle invalid values in calculation [‘drop’, ‘keep’, ‘raise’], default value ‘drop’. Provided either as enum or its string representation.

Returns:

The same OMean object updated for the new data.

Raises:

ValueError – if raise_if_nans is True and there are invalid values (NaNs, infinities or negative weights) in data.
TypeError – if values x or w have more than one dimension or if they are of different size.

weight