The too many parts problem
This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can browse by specific problem. Need more performance optimization tips? Check out the Performance Optimization community insights guide.
Understanding the problem
ClickHouse will throw a "Too many parts" error to prevent severe performance degradation. Small parts cause multiple issues: poor query performance from reading and merging more files during queries, increased memory usage since each part requires metadata in memory, reduced compression efficiency as smaller data blocks compress less effectively, higher I/O overhead from more file handles and seek operations, and slower background merges giving the merge scheduler more work.
Related Docs
Recognize the problem early
This query monitors table fragmentation by analyzing part counts and sizes across all active tables. It identifies tables with excessive or undersized parts that may need merge optimization. Use this regularly to catch fragmentation issues before they impact query performance.
Proper insert batching
Community-proven batching strategy from production deployments:
This batching strategy prevents the "too many parts" problem by accumulating inserts before writing to ClickHouse. The dual threshold approach (size + time) ensures consistent part sizes while preventing data delays, based on patterns from high-volume production systems.
Async inserts
"We developed a function called async insert... this mechanism is straightforward similar to buffer table we insert to the server side and use some buffer to collect these inserts by default we have 16 threads to collect this buffer and if the buffer is large enough or reach timeout we will flush the buffer to the storage so a part will contain multiple inserts" - ClickHouse team explaining built-in solution
Video Sources
- Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse - ClickHouse team member explains async inserts and the too many parts problem
- Production ClickHouse at Scale - Real-world batching strategies from observability platforms