Preprocessing is an activity that is normally carried out before using the time-series in actual models or algorithms.
Preprocessing is meant to enhance certain characteristics of the processes being studied or to remove specific problems from the same data.
An example is detrending that simply removes a linear trend from a series of observations so that the subsequent analysis can be trend independent.
Or we might take the logarithm of each data sample because we suppose that the process is an exponential one. Or we might try to reduce noise or complete
To complete missing data one of the possibilities is to interpolate between known values and reconstruct missing data points.
Spline Smoothing does just that, allows you to rebuild using a smooth spline the data
that is not in your dataset.
Linear detrending is one of the most used and abused methods of preprocessing. It is very useful if the data we're analyzing is composed by an
underlying trend and some other superimposed additive signal, of course we may see a trend also where there is none and the algorithm simply
removes what is the apparent trend from the data.
Rescaling is often useful to "move" data in a better numeric range. For example it is possible to scale a signal whose values were originally spread
between -0.01 and +0.01 to the new scale of 1.0 to 10.0. This allows us to apply further tranformations like using the square root or logarithm.
Normalize is a classical preprocessing option. It removes the mean from the samples and changes their variance to zero. In effect it is a sort of rescaling
followed by a division by the variance of the signal.
Muiltiply changes the signal in a very simple way multiplying each sample of the signal by a constant value. This is useful when values in the signal are
too small or too big or when applying certain operations like multiplying or dividing by π.
Difference calculates the first difference of a signal that is very roughly equivalent to taking the first derivative of the signal. Of course the signal
is not continuous but discrete so that's very different but this is an approximation that is useful in several algorithms.
Log Abs Difference
The transformation computes the logarithm of the absolute value of the first difference of the signal.
Log and Log of Log
Log and Log of Log are very useful when signal do exhibit an exponential trend. This operation removes the need to build complex or nonlinear models.
Exp and Exp of Exp
Exp and Exp of Exp are the functional homologous of Log and Log of Log. These transformations do not compute the inverse of the previous ones due
to rescaling that happens when there are negative numbers.
Root takes the n-th root of each sample of the signal.
Power is equivalent to Root in that is raises to the n-th power each sample of the signal (Root is simply Power using as exponent 1/n).
Tanh computes the hyperbolic tangent of each sample of the signal.
Computes the logistic transformation on the signal. The logistic transformation is computed using the following formula:
Transformedi = Log( samplei / (1.0 - samplei))
Absolute value replaces each negative sample by the corresponding positive value.
Computes the Box-Cox transformation of the signal. The transform is computed using the following formula:
Transformedi = (Power(samplei, λ) – 1.0) / λ
Log of Ratio
This transformation is normally used in finance and is the logarithm of the ratio between element of the signal. This is roughly equivalent to
using the percentage change from sample to sample.