Skip to main content
Skip to main content

The too many parts problem

This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can browse by specific problem. Need more performance optimization tips? Check out the Performance Optimization community insights guide.

Understanding the problem

ClickHouse will throw a "Too many parts" error to prevent severe performance degradation. Small parts cause multiple issues: poor query performance from reading and merging more files during queries, increased memory usage since each part requires metadata in memory, reduced compression efficiency as smaller data blocks compress less effectively, higher I/O overhead from more file handles and seek operations, and slower background merges giving the merge scheduler more work.

Related Docs

Recognize the problem early

This query monitors table fragmentation by analyzing part counts and sizes across all active tables. It identifies tables with excessive or undersized parts that may need merge optimization. Use this regularly to catch fragmentation issues before they impact query performance.

Proper insert batching

Community-proven batching strategy from production deployments:

This batching strategy prevents the "too many parts" problem by accumulating inserts before writing to ClickHouse. The dual threshold approach (size + time) ensures consistent part sizes while preventing data delays, based on patterns from high-volume production systems.

Async inserts

"We developed a function called async insert... this mechanism is straightforward similar to buffer table we insert to the server side and use some buffer to collect these inserts by default we have 16 threads to collect this buffer and if the buffer is large enough or reach timeout we will flush the buffer to the storage so a part will contain multiple inserts" - ClickHouse team explaining built-in solution

Video Sources