One of the most trending cloud-based Data platforms, Snowflake, now embeds advanced modeling features and I gave a shot to the forecasting one.
A few months ago (Nov 23), Snowflake announced the release of multiple new features in the modeling/LLM space, under a framework called “Cortex”.
Since mid-December, the first two functionalities (Forecasting and Anomalies Detections) were made generally available (Snowflake 7.44 Release notes).
Thus, Snowflake continues its mission to offer a fully managed “one-stop-shop” analytics platform to help Data citizens unlock value from their data patrimony, on top of the regular Data Warehouse functionalities aimed at Data Engineering teams.
Such functionalities will remind some of you of the “Google BigQuery ML” ones that were first released in August 2020 (yes, four years ago!); let’s dive in!
Forecasting local city swimming pool visits
Beyond the exciting talks and tailor-made demonstrations of the Snowday ❄️, I was eager to load a real-life dataset in Snowflake and see how Cortex performs compared to what a regular Data Citizen could achieve with the simple combination of Pandas and Scikit-Learn.
I decided to use the frequentation statistics from a local swimming pool close to my home (they had been kind enough to release the data in an “open data” spirit and also because I am a regular swimmer there 🏊♂️).
This is a truly interesting dataset because we can all intuitively imagine all the reasons why the frequentation of a public swimming pool fluctuates:
- regular swimmers vs. kids & families coming for fun once in a while,
- seasons & temperature,
- different opening hours during the week,
- holiday period,
- rain or wind (or both!),
- etc.
So how would a Machine Learning model catch all these phenomena?