Materialized views: the double-edged sword

This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can browse by specific problem. Too many parts bogging your database down? Check out the Too Many Parts community insights guide. Learn more about Materialized Views.

The 10x storage anti-pattern

Real production problem: "We had a materialized view. The raw log table was around 20 gig but the view from that log table exploded to 190 gig, so almost 10x the size of the raw table. This happened because we were creating one row per attribute and each log can have 10 attributes."

Rule: If your GROUP BY creates more rows than it eliminates, you're building an expensive index, not a materialized view.

Production materialized view health validation

This query helps you predict whether a materialized view will compress or explode your data before you create it. Run it against your actual table and columns to avoid the "190GB explosion" scenario.

What it shows:

Low aggregation ratio (<10%) = Good MV, significant compression
High aggregation ratio (>70%) = Bad MV, storage explosion risk
Storage multiplier = How much bigger/smaller your MV will be

When materialized views become a problem

Multiple materialized views on the same table slow down inserts, since each insert must process and write data to all MVs synchronously. The performance impact depends on your insert volume and MV complexity. A table receiving millions of inserts per hour will feel the impact sooner than low-traffic tables.

Warning signs to monitor:

Insert latency increases (queries that took 10ms now take 100ms+)
"Too many parts" errors appearing more frequently
CPU spikes during insert operations
Insert timeouts that didn't happen before

You can compare insert performance before and after adding MVs using system.query_log to track query duration trends.

Alternative approach: For non-critical aggregations, consider regular tables populated by scheduled jobs instead of real-time materialized views. This gives you zero impact on insert performance and more control over processing timing, but data freshness depends on job frequency rather than real-time updates.

This approach works well if you need aggregated data but can tolerate 15-60 minute delays, and your aggregated data is queried less frequently than source data is inserted. Use scheduling tools like cron or systemd timers to run INSERT INTO summary_table SELECT ... FROM source_table periodically.

Video sources

ClickHouse at CommonRoom - Kirill Sapchuk - Source of the "over enthusiastic about materialized views" and "20GB→190GB explosion" case study

The 10x storage anti-pattern​

Production materialized view health validation​

When materialized views become a problem​

Video sources​

The 10x storage anti-pattern

Production materialized view health validation

When materialized views become a problem

Video sources