MIT has developed a powerful forecasting tool for non-experts that uses a novel time-series prediction algorithm.
A system could enable a nonexpert to predict future stock prices with high accuracy in just a few minutes, even if the time-series dataset contains missing values is now possible thanks to the latest research by an MIT team Time series data is the most commonly used dataset in making a lot of predictions, from weather forecasting to the likelihood of someone developing a particular kind of disease.
However, making predictions using time-series data requires several data-processing steps and the use of complex machine-learning algorithms, which have such a steep learning curve they aren’t readily accessible to non-experts. Three MIT researchers have developed a simple interface layer on top of this complex algorithm, that enables even a layperson to generate a prediction.
A powerful forecasting tool for non-experts
Once a user installs tspDB on top of an existing database, they can run a prediction query with just a few keystrokes in about 0.9 milliseconds, as compared to 0.5 milliseconds for a standard search query. The confidence intervals are also designed to help non-experts to make a more informed decision by incorporating the degree of uncertainty of the predictions into their decision making.
To make these powerful tools more user-friendly, MIT researchers (research paper: On Multivariate Singular Spectrum Analysis and its Variants” by Anish Agarwal, Abdullah Alomar and Devavrat Shah) developed a system that directly integrates prediction functionality on top of an existing time-series database.
More accurate than other tools
Their simplified interface, which they call tspDB (time series predict database), does all the complex modelling behind the scenes so a non-expert can easily generate a prediction in only a few seconds. The new system is more accurate and more efficient than state-of-the-art deep learning methods when performing two tasks: predicting future values and filling in missing data points.
The reason behind the success of tspDB is that it incorporates a novel time-series prediction algorithm. This algorithm is especially effective at making predictions on multivariate time-series data, which are data that have more than one time-dependent variable. In a weather database, for instance, temperature, dew point, and cloud cover each depend on their past values.
The right lens to look at time-series
“Even as the time-series data becomes more and more complex, this algorithm can effectively capture any time-series structure out there. It feels like we have found the right lens to look at the model complexity of time-series data,” according to senior author Devavrat Shah, the Andrew and Erna Viterbi Professor in EECS and a member of the Institute for Data, Systems, and Society and of the Laboratory for Information and Decision Systems.
Shah and his collaborators have been working on the problem of interpreting time-series data for years, adapting different algorithms and integrating them into tspDB as they built the interface. About four years ago, they learned about a particularly powerful classical algorithm, called singular spectrum analysis (SSA), that imputes and forecasts single time series. In time series analysis, singular spectrum analysis (SSA) is a nonparametric spectral estimation method.
Automating forecasting
It combines elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems, and signal processing. Imputation is the process of replacing missing values or correcting past values. While this algorithm required manual parameter selection, the researchers suspected it could enable their interface to make effective predictions using time series data. In earlier work, they removed this need to manually intervene for algorithmic implementation.
They tested the adapted mSSA (a variant of SSA they developed) against other state-of-the-art algorithms, including deep-learning methods, on real-world time-series datasets with inputs drawn from the electricity grid, traffic patterns, and financial markets. Their algorithm outperformed all the others on imputation, and it outperformed all but one of the other algorithms when it came to forecasting future values. The researchers also demonstrated that their tweaked version of mSSA can be applied to any kind of time-series data.
“One reason I think this works so well is that the model captures a lot of time series dynamics, but at the end of the day, it is still a simple model. When you are working with something simple like this, instead of a neural network that can easily overfit the data, you can actually perform better,” Alomar says. The impressive performance of mSSA is what makes tspDB so effective, Shah explains. Now, their goal is to make this algorithm accessible to everyone.
Also Read: A $5 trillion Indian economy by 2025: A question of feasibility
(Abhijit Roy is a technology explainer and business journalist. He has worked with Strait Times of Singapore, Business Today, Economic Times and The Telegraph. Also worked with PwC, IBM, Wipro, Ericsson.)
(Disclaimer: The views expressed in the article above are those of the author’s and do not necessarily represent or reflect the views of Autofintechs.com. Unless otherwise noted, the author is writing in his/her personal capacity. They are not intended and should not be thought to represent official ideas, attitudes, or policies of any agency or institution.)