In [1]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;
In [2]:
from IPython.core.display import display, HTML
display(HTML("<style>.rendered_html { font-size: 15px; }</style>"))

import logging
logging.basicConfig(level='INFO')

Timeseria quickstart

Timeseria is a time series processing library which aims at making it easy to handle time series data and to build statistical and machine learning models on top of it.

It comes with a built-in set of common operations (resampling, slotting, differencing etc.) as well as models (reconstruction, forecasting and anomaly detection), and both custom operations and models can be easily plugged in.

Timeseria also tries to address by design all those annoying things which are often left as an implementation detail but that actually cause wasting massive amounts of time - as handling data losses, non-uniform sampling rates, differences between time-slotted data and punctual observations, variable time units, timezones, DST changes and so on.

This is a (super) quickstart, if you are looking for a more structured introduction, have a look at the welcome notebook. Also the reference documentation might be useful.

Load some data

Let's load some data: an indoor temperature winter dataset.

In [3]:
from timeseria import storages
DATASET_PATH = '/'.join(storages.__file__.split('/')[0:-1]) + '/tests/test_data/csv/'

temperature_timeseries = storages.CSVFileStorage(DATASET_PATH + 'temperature_winter.csv').get()

Let's have a look at the time series we just loaded:

In [4]:
temperature_timeseries
Out[4]:
Time series of #14403 points at variable resolution (~600s), from point @ 1546477200.0 (2019-01-03 01:00:00+00:00) to point @ 1555550400.0 (2019-04-18 01:20:00+00:00)

Now plot the data, using Timeseria built-in plotting engine:

In [5]:
temperature_timeseries.plot(aggregate=False)

If you zoom in, you will see that data has been aggregated and plotted as a line chart plus an area chart (which reprensents the minimum and maximum values boundaries before aggregating the data). In this way you can plot even millions of data points wihtout slowing down the plot or crashing your browser, without loosing much information about peaks and spikes.


Resample (and make data uniform)

Now resample the time series at one hour sampling interval, also making data uniform and equally spaced over time. Gaps are filled by linear interpolation (the default interpolation method for the resampling) and the "data loss" index is added as data point attribute.

In [6]:
temperature_timeseries = temperature_timeseries.resample('1h')
INFO:timeseria.transformations:Using auto-detected sampling interval: 600.0s
INFO:timeseria.transformations:Resampled 14403 DataTimePoints in 2521 DataTimePoints
In [7]:
temperature_timeseries
Out[7]:
Time series of #2521 points at 1h resolution, from point @ 1546477200.0 (2019-01-03 01:00:00+00:00) to point @ 1555549200.0 (2019-04-18 01:00:00+00:00)
In [8]:
temperature_timeseries[58].data_loss
Out[8]:
0.25
In [9]:
temperature_timeseries.plot()


Reconstruct missing data

Use a simple, periodic average-based model for reconstructing missing data. Fit on about 80% of the data and test on the rest, with 6, 12, and 24 hours gaps. Limit the evaluation to 100 samples per step to speed up the process.

In [10]:
from timeseria.models import PeriodicAverageReconstructor
In [11]:
paverage_reconstructor = PeriodicAverageReconstructor()
paverage_reconstructor.fit(temperature_timeseries[0:2000])
paverage_reconstructor.evaluate(temperature_timeseries[2000:], steps=[6,12,24], limit=100)
INFO:timeseria.models.reconstructors:Detected periodicity: 24x 1h
INFO:timeseria.models.reconstructors:Will evaluate model for [6, 12, 24] steps with metrics ['RMSE', 'MAE']
Out[11]:
{'RMSE': 0.7125619042757325, 'MAE': 0.5803696784300851}

Apply the reconstruction model. By default only full (100%) data losses are recontructed, as Timeseria considers data points with lower da losses (i.e. because in the interpolation process they were sitting in between existing and missing data points) as still representative. Also, a "data_reconstructed" index is added.

In [12]:
temperature_timeseries = paverage_reconstructor.apply(temperature_timeseries)

...and plot:

In [13]:
temperature_timeseries.plot()
In [14]:
temperature_timeseries[0].data_reconstructed
Out[14]:
0


Three days hourly temperature forecast

Use a LSTM neural network mdoel to to forecast three days of temperatures, but before fitting run a cross validation to get an idea about the accuracy you can expect:

In [15]:
from timeseria.models import LSTMForecaster
In [16]:
LSTM_forecaster = LSTMForecaster(window=12, neurons=64, features=['values', 'diffs', 'hours'])
LSTM_forecaster.cross_validate(temperature_timeseries, rounds=3)
INFO:timeseria.models.base:Cross validation round #1 of 3: validate from 1546477200.0 (2019-01-03 01:00:00+00:00) to 1549501200.0 (2019-02-07 01:00:00+00:00), fit on the rest.
INFO:timeseria.models.forecasters:Will evaluate model for [1, 2, 3] steps with metrics ['RMSE', 'MAE']
INFO:timeseria.models.base:Cross validation round #2 of 3: validate from 1549501200.0 (2019-02-07 01:00:00+00:00) to 1552525200.0 (2019-03-14 01:00:00+00:00), fit on the rest.
INFO:timeseria.models.forecasters:Will evaluate model for [1, 2, 3] steps with metrics ['RMSE', 'MAE']
INFO:timeseria.models.base:Cross validation round #3 of 3: validate from 1552525200.0 (2019-03-14 01:00:00+00:00) to 1555549200.0 (2019-04-18 01:00:00+00:00), fit on the rest.
INFO:timeseria.models.forecasters:Will evaluate model for [1, 2, 3] steps with metrics ['RMSE', 'MAE']
Out[16]:
{'RMSE_avg': 0.3536184668984925,
 'RMSE_stdev': 0.06471130506319013,
 'MAE_avg': 0.25550124428478493,
 'MAE_stdev': 0.04366401802252287}

Now fit, apply and plot:

In [17]:
LSTM_forecaster.fit(temperature_timeseries)
LSTM_forecaster.apply(temperature_timeseries,n=72).plot(indexes=['forecast'])


Run an anomaly detection model

Use the periodic average anomaly detection model, which will consider a data point value an anomaly if too far from its periodic average. This will add an "anomaly" index as data points attibute.

In [18]:
from timeseria.models import PeriodicAverageAnomalyDetector
paverage_anomaly_detector = PeriodicAverageAnomalyDetector()
paverage_anomaly_detector.fit(temperature_timeseries, stdevs=5)
INFO:timeseria.models.forecasters:Detected periodicity: 24x 1h
INFO:timeseria.models.forecasters:Using a window of "24"
INFO:timeseria.models.anomaly_detectors:Using 5 standard deviations as anomaly threshold: 1.8218057469668172

Apply and plot, but show only the "anomaly" index to improve plot readability:

In [19]:
paverage_anomaly_detector.apply(temperature_timeseries).plot()

Altogether

Apply again both models, but this time on the same time series, and plot:

In [20]:
anoamaly_temperature_timeseries = paverage_anomaly_detector.apply(temperature_timeseries)
forecast_and_anomaly_temperature_timeseries = LSTM_forecaster.apply(anoamaly_temperature_timeseries, n=72)
forecast_and_anomaly_temperature_timeseries.plot()

Move to daily data

Slot the time series into 1-day slots, also computing the min and max operations besideds the default average one, but change the timezone before to get the right daily aggregates and to properly take into account the DST change. Slotting in days is indeed different than slotting in 24 hours, they are just two different time units: the first is variable, the second is fixed.

In [21]:
forecast_and_anomaly_temperature_timeseries.change_timezone('Europe/Rome')
from timeseria.operations import min, max
forecast_and_anomaly_temperature_timeseries.slot(unit='1D', extra_operations=[min,max]).plot()
INFO:timeseria.transformations:Using auto-detected sampling interval: 3600.0s
INFO:timeseria.transformations:Slotted 2567 DataTimePoints in 106 DataTimeSlots


Next steps

You can have a look at the welcome tutorial, which provides a more structired introduction on Timeseria datastructures and their philosophy, as well as practical examples on both buiilt-in and custom models and operations.

You can also have a look at the example repository (Timeseria-notebooks), which is also ready to be play wiht in Binder, together with this quickstart and the welcome tutorial.


Or, you can give it a try in your own projects:

pip install git+https://github.com/sarusso/Timeseria.git

..or you can contribute! :)