# Transformers¶

Module of transformers.

A transformer transforms time series to extract useful information.

adtk.transformer.print_all_models()[source]

Print description of every model in this module.

class adtk.transformer.RollingAggregate(agg='mean', agg_params=None, window=10, center=False, min_periods=None)[source]

Transformer that roll a sliding window along a time series, and aggregates using a user-selected operation.

This is an univariate transformer. When it is applied to a multivariate time series (i.e. pandas DataFrame), it will be applied to every series independently. All parameters can be defined as a dict object where key- value pairs are series names (i.e. column names of DataFrame) and the model parameter for that series. If not, then the same parameter will be applied to all series.

Parameters: agg (str or function) – Aggregation method applied to series. If str, must be one of supported built-in methods: ’mean’: mean of all values in a rolling window. ’median’: median of all values in a rolling window. ’sum’: summation of all values in a rolling window. ’min’: minimum of all values in a rolling window. ’max’: maximum of all values in a rolling window. ’std’: sample standard deviation of all values in a rolling window. ’var’: sample variance of all values in a rolling window. ’skew’: skewness of all values in a rolling window. ’kurt’: kurtosis of all values in a rolling window. ’count’: number of non-nan values in a rolling window. ’nnz’: number of non-zero values in a rolling window. ’nunique’: number of unique values in a rolling window. ’quantile’: quantile of all values in a rolling window. Require percentile parameter q in in parameter agg_params, which is a float or a list of float between 0 and 1 inclusive. ’iqr’: interquartile range, i.e. difference between 75% and 25% quantiles. ’idr’: interdecile range, i.e. difference between 90% and 10% quantiles. ’hist’: histogram of all values in a rolling window. Require parameter bins in parameter agg_params to define the bins. bins is either a list of floats, b1, …, bn, which defines n-1 bins [b1, b2), [b2, b3), …, [b{n-2}, b{n-1}), [b{n-1}, bn], or an integer that defines the number of equal-width bins in the range of input series. If function, it should accept a rolling window in form of a pandas Series, and return either a scalar or a 1D numpy array. To specify names of outputs, specify a list of strings as a parameter names in parameter agg_params. Default: ‘mean’ agg_params (dict, optional) – Parameters of aggregation function. Default: None. window (int, optional) – Width of rolling windows (number of data points). Default: 10. center (bool, optional) – Whether the calculation is at the center of time window or on the right edge. Default: False. min_periods (int, optional) – Minimum number of observations in window required to have a value. Default: None, i.e. all observations must have values.
fit(ts)

Train the transformer with given time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used to train the transformer. If a DataFrame with k columns, k univariate transformers will be trained independently.
fit_predict(ts, *args, **kwargs)

Alias of fit_transform.

fit_transform(ts)

Train the transformer, and tranform the time series used for training.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used for training and be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(ts, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(ts)

Transform time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
class adtk.transformer.DoubleRollingAggregate(agg='mean', agg_params=None, window=10, center=True, min_periods=None, diff='l1')[source]

Transformer that rolls two sliding windows side-by-side along a time series, aggregates using a user-given operation, and calcuates the difference of aggregated metrics between two sliding windows.

This is an univariate transformer. When it is applied to a multivariate time series (i.e. pandas DataFrame), it will be applied to every series independently. All parameters can be defined as a dict object where key- value pairs are series names (i.e. column names of DataFrame) and the model parameter for that series. If not, then the same parameter will be applied to all series.

Parameters: agg (str, function, or tuple) – Aggregation method applied to series. If str, must be one of supported built-in methods: ’mean’: mean of all values in a rolling window. ’median’: median of all values in a rolling window. ’sum’: summation of all values in a rolling window. ’min’: minimum of all values in a rolling window. ’max’: maximum of all values in a rolling window. ’std’: sample standard deviation of all values in a rolling window. ’var’: sample variance of all values in a rolling window. ’skew’: skewness of all values in a rolling window. ’kurt’: kurtosis of all values in a rolling window. ’count’: number of non-nan values in a rolling window. ’nnz’: number of non-zero values in a rolling window. ’nunique’: number of unique values in a rolling window. ’quantile’: quantile of all values in a rolling window. Require percentile parameter q in in parameter agg_params, which is a float or a list of float between 0 and 1 inclusive. ’iqr’: interquartile range, i.e. difference between 75% and 25% quantiles. ’idr’: interdecile range, i.e. difference between 90% and 10% quantiles. ’hist’: histogram of all values in a rolling window. Require parameter bins in parameter agg_params to define the bins. bins is either a list of floats, b1, …, bn, which defines n-1 bins [b1, b2), [b2, b3), …, [b{n-2}, b{n-1}), [b{n-1}, bn], or an integer that defines the number of equal-width bins in the range of input series. If function, it should accept a rolling window in form of a pandas Series, and return either a scalar or a 1D numpy array. To specify names of outputs, specify a list of strings as a parameter names in parameter agg_params. If tuple, elements correspond left and right window respectively. Default: ‘mean’ agg_params (dict or tuple, optional) – Parameters of aggregation function. If tuple, elements correspond left and right window respectively. Default: None. window (int or tuple, optional) – Width of rolling windows (number of data points). If tuple, elements correspond left and right window respectively. Default: 10. center (bool, optional) – If True, the current point is the right edge of right window; Otherwise, it is the right edge of left window. Default: True. min_periods (int or tuple, optional) – Minimum number of observations in window required to have a value. Default: None, i.e. all observations must have values. diff (str or function, optional) – Difference method applied between aggregated metrics from the two sliding windows. If str, choose from supported built-in methods: ’diff’: Difference between values of aggregated metric (right minus left). Only applicable if the aggregated metric is scalar. ’rel_diff’: Relative difference between values of aggregated metric (right minus left divided left). Only applicable if the aggregated metric is scalar. ’abs_rel_diff’: Absolute relative difference between values of aggregated metric (right minus left divided left). Only applicable if the aggregated metric is scalar. ’l1’: Absolute difference if aggregated metric is scalar, or sum of elementwise absolute difference if it is a vector. ’l2’: Square root of sum of elementwise squared difference. If function, it accepts two input arguments that are the two outputs of applying aggregation method to the two windows, and returns a float number measuring the difference. Default: ‘l1’
fit(ts)

Train the transformer with given time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used to train the transformer. If a DataFrame with k columns, k univariate transformers will be trained independently.
fit_predict(ts, *args, **kwargs)

Alias of fit_transform.

fit_transform(ts)

Train the transformer, and tranform the time series used for training.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used for training and be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(ts, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(ts)

Transform time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
class adtk.transformer.ClassicSeasonalDecomposition(freq=None, trend=False)[source]

Transformer that performs classic seasonal decomposition to the time series, and returns residual series.

Classic seasonal decomposition assumes time series is the sum of trend, seasonal pattern, and noise (residual). This transformer calculates and removes trend component with moving average, extracts seasonal pattern by taking average over seasonal periods of the detrended series, and returns residual series.

The fit method fits seasonal frequency (if not specified) and seasonal pattern with the training series. The transform (or its alias predict) method extracts the trend by moving average, but will NOT re-calucate the seasonal pattern. Instead, it uses the trained seasonal pattern and extracts it from the detrended series to obtain the residual series. This implicitly assumes the seasonal property does not change over time.

Parameters: freq (int, optional) – Length of a seasonal cycle. If None, the model will determine based on autocorrelation of the training series. Default: None. trend (bool, optional) – Whether to extract and remove trend of the series with moving average. If False, the time series will be assumed the sum of seasonal pattern and residual. Default: False.
freq_

Length of seasonal cycle. Equal to parameter freq if it is given. Otherwise, calculated based on autocorrelation of the training series.

Type: int
seasonal_

Seasonal pattern extracted from training series.

Type: pandas.Series

This is an univariate transformer. When it is applied to a multivariate time series (i.e. pandas DataFrame), it will be applied to every series independently. All parameters can be defined as a dict object where key- value pairs are series names (i.e. column names of DataFrame) and the model parameter for that series. If not, then the same parameter will be applied to all series.

fit(ts)

Train the transformer with given time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used to train the transformer. If a DataFrame with k columns, k univariate transformers will be trained independently.
fit_predict(ts, *args, **kwargs)

Alias of fit_transform.

fit_transform(ts)

Train the transformer, and tranform the time series used for training.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used for training and be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(ts, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(ts)

Transform time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
class adtk.transformer.Retrospect(n_steps=1, step_size=1, till=0)[source]

Transformer that returns dataframe with retrospective values, i.e. a row at time t includes value at (t-k)’s where k’s are specified by user.

This transformer may be useful for cases where lagging effect should be taken in account. For example, a change of control u may not be reflected in outcome y within 2 minutes, and its effect may last for another 3 minutes. In this case, a dataframe where each row include u_[t-3], u_[t-4], u_[t-5], and a series y_t are needed to learn the relationship between control and outcome.

This is an univariate transformer. When it is applied to a multivariate time series (i.e. pandas DataFrame), it will be applied to every series independently. All parameters can be defined as a dict object where key- value pairs are series names (i.e. column names of DataFrame) and the model parameter for that series. If not, then the same parameter will be applied to all series.

Parameters: n_steps (int, optional) – Number of retrospective steps to take. Default: 1. step_size (int, optional) – Length of a retrospective step. Default: 1. till (int, optional) – Nearest retrospective step. Default: 0.

Examples

>>> s = pd.Series(
np.arange(10),
index=pd.date_range(
start='2017-1-1',
periods=10,
freq='D'))
2017-01-01    0
2017-01-02    1
2017-01-03    2
2017-01-04    3
2017-01-05    4
2017-01-06    5
2017-01-07    6
2017-01-08    7
2017-01-09    8
2017-01-10    9
>>> Retrospect(n_steps=3, step_size=2, till=1).transform(s)
t-1     t-3     t-5
2017-01-01  NaN     NaN     NaN
2017-01-02  0.0     NaN     NaN
2017-01-03  1.0     NaN     NaN
2017-01-04  2.0     0.0     NaN
2017-01-05  3.0     1.0     NaN
2017-01-06  4.0     2.0     0.0
2017-01-07  5.0     3.0     1.0
2017-01-08  6.0     4.0     2.0
2017-01-09  7.0     5.0     3.0
2017-01-10  8.0     6.0     4.0

fit(ts)

Train the transformer with given time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used to train the transformer. If a DataFrame with k columns, k univariate transformers will be trained independently.
fit_predict(ts, *args, **kwargs)

Alias of fit_transform.

fit_transform(ts)

Train the transformer, and tranform the time series used for training.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used for training and be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(ts, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(ts)

Transform time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
class adtk.transformer.StandardScale[source]

Transformer that scales time series such that mean is equal to 0 and standard deviation is equal to 1.

This is an univariate transformer. When it is applied to a multivariate time series (i.e. pandas DataFrame), it will be applied to every series independently.

fit(ts)

Train the transformer with given time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used to train the transformer. If a DataFrame with k columns, k univariate transformers will be trained independently.
fit_predict(ts, *args, **kwargs)

Alias of fit_transform.

fit_transform(ts)

Train the transformer, and tranform the time series used for training.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used for training and be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(ts, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(ts)

Transform time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
class adtk.transformer.CustomizedTransformer1D(transform_func=None, transform_func_params=None, fit_func=None, fit_func_params=None)[source]

Transformer derived from a user-given function and parameters.

Parameters: transform_func (function) – A function transforming given time serie into new one. The first input argument must be a pandas Series, optional input argument allows; the output must be a pandas Series or DataFrame with the same index as input. transform_func_params (dict, optional) – Parameters of transform_func. Default: None. fit_func (function, optional) – A function learning from a list of time series and return parameters dict that transform_func can used for future transformation. Default: None. fit_func_params (dict, optional) – Parameters of fit_func. Default: None.

This is an univariate transformer. When it is applied to a multivariate time series (i.e. pandas DataFrame), it will be applied to every series independently. All parameters can be defined as a dict object where key- value pairs are series names (i.e. column names of DataFrame) and the model parameter for that series. If not, then the same parameter will be applied to all series.

fit(ts)

Train the transformer with given time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used to train the transformer. If a DataFrame with k columns, k univariate transformers will be trained independently.
fit_predict(ts, *args, **kwargs)

Alias of fit_transform.

fit_transform(ts)

Train the transformer, and tranform the time series used for training.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be used for training and be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(ts, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(ts)

Transform time series.

Parameters: ts (pandas.Series or pandas.DataFrame) – Time series to be transformed. If a DataFrame with k columns, k univariate transformers will be applied to them independently. Transformed time series. pandas.Series or pandas.DataFrame
class adtk.transformer.RegressionResidual(regressor=None, target=None)[source]

Transformer that performs regression to build relationship between a target series and the rest of series, and returns regression residual series.

Parameters: regressor (object) – Regressor to be used. Same as a scikit-learn regressor, it should minimally have fit and predict methods. target (str, optional) – Name of the column to be regarded as target variable. If not specified, the first column in input DataFrame will be used.
fit(df)

Train the transformer with given time series.

Parameters: df (pandas.DataFrame) – Time series to be used to train the transformer.
fit_predict(df, *args, **kwargs)

Alias of fit_transform.

fit_transform(df)

Train the transformer, and tranform the time series used for training.

Parameters: df (pandas.DataFrame) – Time series to be used for training and be transformed. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(df, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(df)

Transform time series.

Parameters: df (pandas.DataFrame) – Time series to be transformed. Transformed time series. pandas.Series or pandas.DataFrame
class adtk.transformer.PcaProjection(k=1)[source]

Transformer that performs principal component analysis (PCA) to the multivariate time series (every time point is treated as a point in high- dimensional space), and represent those points with their projection on the first k principal components.

Parameters: k (int, optional) – Number of principal components to use. Default: 1.
fit(df)

Train the transformer with given time series.

Parameters: df (pandas.DataFrame) – Time series to be used to train the transformer.
fit_predict(df, *args, **kwargs)

Alias of fit_transform.

fit_transform(df)

Train the transformer, and tranform the time series used for training.

Parameters: df (pandas.DataFrame) – Time series to be used for training and be transformed. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(df, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(df)

Transform time series.

Parameters: df (pandas.DataFrame) – Time series to be transformed. Transformed time series. pandas.Series or pandas.DataFrame
class adtk.transformer.PcaReconstruction(k=1)[source]

Transformer that performs principal component analysis (PCA) to the multivariate time series (every time point is treated as a point in high- dimensional space), and reconstruct those points with the first k principal components.

Parameters: k (int, optional) – Number of principal components to use. Default: 1.
fit(df)

Train the transformer with given time series.

Parameters: df (pandas.DataFrame) – Time series to be used to train the transformer.
fit_predict(df, *args, **kwargs)

Alias of fit_transform.

fit_transform(df)

Train the transformer, and tranform the time series used for training.

Parameters: df (pandas.DataFrame) – Time series to be used for training and be transformed. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(df, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(df)

Transform time series.

Parameters: df (pandas.DataFrame) – Time series to be transformed. Transformed time series. pandas.Series or pandas.DataFrame
class adtk.transformer.PcaReconstructionError(k=1)[source]

Transformer that performs principal component analysis (PCA) to the multivariate time series (every time point is treated as a point in high- dimensional space), reconstruct those points with the first k principal components, and return the reconstruction error (i.e. squared distance bewteen the reconstructed point and original point).

Parameters: k (int, optional) – Number of principal components to use. Default: 1.
fit(df)

Train the transformer with given time series.

Parameters: df (pandas.DataFrame) – Time series to be used to train the transformer.
fit_predict(df, *args, **kwargs)

Alias of fit_transform.

fit_transform(df)

Train the transformer, and tranform the time series used for training.

Parameters: df (pandas.DataFrame) – Time series to be used for training and be transformed. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(df, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(df)

Transform time series.

Parameters: df (pandas.DataFrame) – Time series to be transformed. Transformed time series. pandas.Series or pandas.DataFrame
class adtk.transformer.SumAll[source]

Transformer that returns the sum all series as one series.

fit(df)

Train the transformer with given time series.

Parameters: df (pandas.DataFrame) – Time series to be used to train the transformer.
fit_predict(df, *args, **kwargs)

Alias of fit_transform.

fit_transform(df)

Train the transformer, and tranform the time series used for training.

Parameters: df (pandas.DataFrame) – Time series to be used for training and be transformed. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(df, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(df)

Transform time series.

Parameters: df (pandas.DataFrame) – Time series to be transformed. Transformed time series. pandas.Series or pandas.DataFrame
class adtk.transformer.CustomizedTransformerHD(transform_func=None, transform_func_params=None, fit_func=None, fit_func_params=None)[source]

Transformer derived from a user-given function and parameters.

Parameters: transform_func (function) – A function transforming given time serie into new one. The first input argument must be a pandas Dataframe, optional input argument allows; the output must be a pandas Series or DataFrame with the same index as input. transform_func_params (dict, optional) – Parameters of transform_func. Default: None. fit_func (function, optional) – A function learning from a list of time series and return parameters dict that transform_func can used for future transformation. Default: None. fit_func_params (dict, optional) – Parameters of fit_func. Default: None.
fit(df)

Train the transformer with given time series.

Parameters: df (pandas.DataFrame) – Time series to be used to train the transformer.
fit_predict(df, *args, **kwargs)

Alias of fit_transform.

fit_transform(df)

Train the transformer, and tranform the time series used for training.

Parameters: df (pandas.DataFrame) – Time series to be used for training and be transformed. Transformed time series. pandas.Series or pandas.DataFrame
get_params()

Get parameters of this model.

Returns: Model parameters. dict
predict(df, *args, **kwargs)

Alias of transform.

set_params(**kwargs)

Set parameters of this model.

Parameters: **kwargs – Model parameters to set. If empty, then all parameters will be reset to default values.
transform(df)

Transform time series.

Parameters: df (pandas.DataFrame) – Time series to be transformed. Transformed time series. pandas.Series or pandas.DataFrame