Getting to Grips with Time Series Predictive Analysis Using 'R' (Theory)
Time Series - The Theory
Importance of Time Series predictions
The ability to observe the development of an area of interest (over time) and make predictions based upon historical observations creates a competitive edge for any modern business.
Predicting sales quantities, for example, will have knock on benefits such as cost savings for the company.
Management too can better formulate planning, research and development, head count, budgets and so on.
A fundamental concept of Time Series is ‘Decomposition’. When we look at the initial time series chart (assuming the data has a time factor to it, we need to think about decomposing the chart into its basic building blocks and then determine what type of ‘Time Series’ we are trying to model.
We need to ensure that we use the most appropriate analytical model that best suits the data.
R and other statistical packages will usually break the chart down into the general observances, the trend line (over time) often seasonality and finally random fluctuations in the data or irregular components.
These are processes that have a natural calendar related effect like the increase in ice-cream consumption in summer due to warmer weather. So natural phenomena like weather, business phenomena like start and end of the academic year, social behaviours like Christmas, Halloween and Easter and public holidays which change from year to year, all contribute.
The process of removing from a time series influences that are systematic and calendar related. We need to seasonally adjust the observed data to ensure we can uncover the true underlying movement in the series.
Spotting Seasonality in Data
Seasonality can usually be spotted as there’ll be peaks and troughs in the calendar data usually with consistent direction and of the same magnitude every period, relative to the trend.
A trend is the ‘long term’ movement in a time series without calendar related and random effects.
The irregular or residual component is what’s left after the seasonal and trend components of a ‘time series’ have been estimated and removed. It’s normally as a result of short term fluctuations in a ‘series’ which are neither systematic nor predictable. In a highly irregular ‘series’ these random fluctuations can dominate movements which will mask the trend and seasonality.
We can combine the three components that match the observed data in two different ways. That is to say:
Additively – Data = Seasonal + Trend + Random
Multiplicative – Data = Seasonal * Trend * Random (Easy to fit by the way if we take the logarithms of both sides of the model.
Log (Data) [Natural or Base 10] = Log (Seasonal * Trend * Random)
In much of ‘time series’ it is the percentage change rather than the absolute difference in the value that happens. In many ’time series’ it is the amplitude of both seasonal and irregular variations that increase at the level of the trend rises. The value of the seasonal fluctuations varies as well.
General Approach for Analysis
Identify through visual inspection whether the data has seasonality or trends.
Identify whether the decomposition technique required is additive or mullticlative. Log Transform the mullticlative if needed.
Test appropriative additive algorithms.
Simple moving average.
Use smoothing: Seasonal = No, Trend = Yes, correlation = No
Seasonal adjustments: Seasonal = Yes, Trend = Yes, correlation = No
Holts Exponential Smoothing: Seasonal = No, Trend = Yes and Correlation = No
Holt Winters Exponential Smoothing: Seasonal = Yes, Trend = Yes, Correlation = No
ABIMA = Seasonal = Yes, Trend = Yes, Correlations Yes.
Perform statistical tests to verify correct model selected:
Ljung – Box Test
Mean = 0
Auto correlation Function (ACF)
Partial-Auto correlation Function (PACF).
In a future Blog I will go into the analysis and testing theory in more detail with examples. The next Blog provides a simple example of 'Time Series'.