Data Preprocessing
Preprocessing is an activity that is normally carried out before using the time-series in actual models or algorithms.
Preprocessing is meant to enhance certain characteristics of the processes being studied or to remove specific problems from the same data.
An example is detrending that simply removes a linear trend from a series of observations so that the subsequent analysis can be trend independent.
Or we might take the logarithm of each data sample because we suppose that the process is an exponential one. Or we might try to reduce noise or complete
missing data.
Spline Smoothing
To complete missing data one of the possibilities is to interpolate between known values and reconstruct missing data points.
Spline Smoothing does just that, allows you to rebuild using a smooth spline the data
that is not in your dataset.
Linear Detrend
Linear detrending is one of the most used and abused methods of preprocessing. It is very useful if the data we're analyzing is composed by an
underlying trend and some other superimposed additive signal, of course we may see a trend also where there is none and the algorithm simply
removes what is the apparent trend from the data.
Rescale
Rescaling is often useful to "move" data in a better numeric range. For example it is possible to scale a signal whose values were originally spread
between -0.01 and +0.01 to the new scale of 1.0 to 10.0. This allows us to apply further tranformations like using the square root or logarithm.
Normalize
Normalize is a classical preprocessing option. It removes the mean from the samples and changes their variance to zero. In effect it is a sort of rescaling
followed by a division by the variance of the signal.
Multiply
Muiltiply changes the signal in a very simple way multiplying each sample of the signal by a constant value. This is useful when values in the signal are
too small or too big or when applying certain operations like multiplying or dividing by π.
Difference
Difference calculates the first difference of a signal that is very roughly equivalent to taking the first derivative of the signal. Of course the signal
is not continuous but discrete so that's very different but this is an approximation that is useful in several algorithms.
Log Abs Difference
The transformation computes the logarithm of the absolute value of the first difference of the signal.
Log and Log of Log
Log and Log of Log are very useful when signal do exhibit an exponential trend. This operation removes the need to build complex or nonlinear models.
Exp and Exp of Exp
Exp and Exp of Exp are the functional homologous of Log and Log of Log. These transformations do not compute the inverse of the previous ones due
to rescaling that happens when there are negative numbers.
Root
Root takes the n-th root of each sample of the signal.
Power
Power is equivalent to Root in that is raises to the n-th power each sample of the signal (Root is simply Power using as exponent 1/n).
Tanh
Tanh computes the hyperbolic tangent of each sample of the signal.
Logistic
Computes the logistic transformation on the signal. The logistic transformation is computed using the following formula:
Transformedi = Log( samplei / (1.0 - samplei))
Absolute Value
Absolute value replaces each negative sample by the corresponding positive value.
Box-Cox Transform
Computes the Box-Cox transformation of the signal. The transform is computed using the following formula:
Transformedi = (Power(samplei, λ) – 1.0) / λ
Log of Ratio
This transformation is normally used in finance and is the logarithm of the ratio between element of the signal. This is roughly equivalent to
using the percentage change from sample to sample.