TinyBird Materialized Views

Notes on Tinybird platform.


##TL:DR; The idea behind it (rollups) is simple: Queries built over long periods of raw time series data may degrade as the amount of data grows. If you need aggregated time series data over long timespans (e.g. transaction counts over the last 6 months), you shouldn’t calculate the entire aggregation at query time.

Rather, you should use aggregations that incrementally calculate what you want to see. Storing these calculations as new data arrives - and only one time - minimizes the amount of data you scan making subsequent queries much faster and cheaper.

What are they ?

Materialized Views give you a way to pre-aggregate and pre-filter large Data Sources incrementally, adding simple logic using SQL to produce a more relevant Data Source with significantly fewer rows.

In the simplest terms, a Materialized View is the continuous, streaming result of a Pipe saved as a new Data Source.

As new data is ingested into the origin Data Source, the transformed results from the Pipe will continually be inserted in the new Materialized View Data Source, and you can query it just like you would query any other Data Source in Tinybird.

ex: Display period-of-time aggregated data: hourly, monthly, yearly without waiting for computation processing

Materialized views can create new data sources containing pre-aggreagated time series metrics that we can later on query.

How is data inserted?

Data is ingested in blocks from the origin Data Source, then a materialization process is triggered and transformations are applied over data (Aggregation, etc), returning a partial result in the Materialized view. As well ingestion of data can be partiall or complete. If DS are too big, the ingestion is done in blocks

Rows ingested and be sent to another MV though a Populate call

  • MVs in tinybird are incremental and TRIGGERED UPON INGESTION. Restrict to use SQL operations like ORDER BY as functions on MV are ONLY applied to NEW DATA.

Aggregations

Bear in mind there is a background process in Tinybird that accounts for merging the partial results in intermediate states suring the ingestion process, thus compacting results and reducing the number of rows.

Evry 8-10 minutes this process will merge the states, thus completing the aggregation. Because we are working in real-time, we can’t always wait for this background process to take place. Fortunately, this is easily accounted for: When querying the Materialized View, you should use the proper merge combinator and GROUP BY in the query itself.

In order to do this, we use the -State suffix for storing intermediate states and the -Merge suffix when querying the data so that we can make sure the merge process is completed

NODE Avg calculation
SQL >
SELECT
  day,
  city,
  avgState() avg
FROM table
GROUP BY day, city
NODE endpoint
SQL >
%
SELECT
  day,
  city,
  avgMerge(avg) as avg
FROM avg_table
WHERE day > {{Date(start_date)}}
GROUP BY day, city

Materialized Views parameters

  • Sorting key: Used for defining how data is sorted and critical for filtering. For MVs containig a ggreagtions, all keys on GROUP BY should be a sorting key. The order of the keys in the ENGINE_SORTING_KEY matters.

  • Partition By: Logical combination of records based on a criterion. ex: dates, partition by month

  • TTL: Sets the TTL of the MV. Might be useful when querying for fresh data. Ex: only data within the last week --> TTL of 7

Refrence

https://www.tinybird.co/blog-posts/roll-up-data-with-materialized-views

https://www.tinybird.co/docs/guides/publish/master-materialized-views