Chia plotting improvements in version 1.04

Download version 1.04 at Chia.net, official release notes

The Chia team is delivering a significant improvement to the plotter in version 1.04. The notable changes include a 26% reduction in memory resources required per plotting process and a 28% reduction in temporary storage space. The net result of this for many people will mean a significant boost in plotting output in TB/day on the same system! For new plotting system builders, this will mean different ratios of compute, memory, and storage.

Here is a quick example of some of the gains I saw in the version 1.04 beta…I will be testing a lot more systems in the coming weeks.

Example of real plotting improvements

System	change	Version 1.03	Version 1.04
NUC	5 → 6 processes	1.31TiB or 1.44TB per day	1.62TiB or 1.78TB per day
Intel Server System R2308WFTZSR	27 → 46 processes	6.77TiB or 7.44TB per day	9.16TiB or 10TB a day
Budget build, Z490	3TB → 2TB temp space (reduction $100-200)	3TB per day	3TB per day

New Temporary Space and Memory Requirements in 1.04

K-value	DRAM (MiB)	Temp Space (GiB)	Temp Space (GB)
32	3389	239	256.6
33	7400	512	550
34	14800	1041	1118
35	29600	2175	2335

Old Requirements in 1.03 and previous

K-value	DRAM (MiB)	Temp Space (GiB)	Temp Space (GB)
32	4608	332	356
33	9216	589	632
34	18432	1177	1264
35	36864	2355	2529

Example systems and new build options

System	Cost	CPU cores	DRAM (GiB)	Temp (GB)	SSD	Expected TB/day
Value desktop	$400	4	13.2	1024	1TB	1-2
Desktop	$700	6	20	1536	2x 800GB or 1x 1.6TB	2-3
Desktop	$900	8	26.5	2048	2x 1TB, 1x 2TB (stagger)	3-4
High-end desktop	$2000	12	39.7	3072	2x 1.6TB	4.5
High-end desktop	$3000	16	53	4096	5x 960GB	5-6
Server	$4000	32	106	8192	3x 3.2TB	8-10

What about staggering?

Staggering, or delaying the start of the following plotting process to start, can help maximize compute (CPU), memory (DRAM), and disk (temp space) utilization because different phases of the plotting process require different amounts of these resources. The main advantage of staggering in mid-range desktop systems with eight cores was squeezing eight processes into 32GB of DDR4 without having processes swap (slowdown) or getting killed (not enough memory). This speeds up the completion times of the phase1. The memory reduction mostly came from the fact that phase2 and phase3 of the plotting process were the most memory-intensive; ensuring that all the processes weren’t in this phase simultaneously meant better sharing of memory resources over time. There was a secondary effect of staggering of not thrashing the destination disk with multiple file copies from the second temporary directory (-2) to the final destination directory (-d) simultaneously. If a user only uses a single drive, this is a big deal because hard disk drive bandwidth is ~100-275MB/s, depending on how full the disk is, and will delay the start of the following process while using an n value higher than 1. We will have to explore how the new 1.04 plotter changes staggering benefits in the coming weeks after the 1.04 release.

What about hyperthreading?

Another area to explore in 1.04 with running more processes than physical hardware CPU cores is typically 2x the number of CPU threads to physical cores due to a feature called hyperthreading. There is now likely a memory imbalance because DRAM comes in the power of 2, 16GB, 32GB, 64GB, etc. Using an uneven amount of DIMMs will decrease memory bandwidth. In an eight-core system with eight processes, only 26.5GiB of memory is required, leaving the system underutilized. The operating system uses the free DRAM for caching, and there is a higher DRAM overhead of the OS if the user is using a GUI or desktop version. There is additional testing with ten processes to see if a specific system can deliver a higher amount of total output at a fixed cost and if there is any real benefit from oversubscribing plotting processes to physical CPU cores, and with more DRAM and temp space, how much that increases the output of a system (vs. building a second system). Ten processes are also slightly over 32GB, and there is testing with staggering required to see if it is even possible.

Bitfield vs. no bitfield?

More testing is required, but the improvements in bitfield may make the default plotter faster in the majority of scenarios. The lower amount of data written and improved sorting speeds will help in many parallel processes. The community will discover some exciting data in the coming weeks, but for now, most of the testing done on 1.04 beta was with bitfield-enabled (default) plotter settings.

SSD endurance impact

The code improvements in the 1.04 plotter reduce the amount of data written per K=32 from 1.6TiB (bitfield) and 1.8TiB (-e) to an estimated 1.4TiB (coming soon, measuring now!)

Impact to temporary SSDs required

SSDs come in many shapes and sizes, and the optimal SSD for plotting is a data center SSD. You will have to do the math yourself on a different number of drives for each system based on the hardware you can obtain. More SSDs of a smaller capacity in a RAID 0 generally outperform the larger models due to the smaller SSDs having higher IOPS/TB & bandwidth/TB than the larger models. There is a tradeoff of physical connection and price per SSD as well to consider. When calculating temporary storage space, you can use the label capacity (below) with the GB column in the temp space tables (above) or convert to GiB from what the operating system shows.

SSD Type	Capacities
Consumer	500GB, 1TB, 2TB, 4TB
Data Center NVMe – hyperscale	960, 1920, 3840GB
Enterprise SATA recent	480, 960GB, 1.92, 3.84, 7.68TB
Old enterprise (2014-2016)	200, 400, 800, 1600GB
Old Intel (2014-2017)	1TB, 2TB, 4TB, 8TB
Enterprise NVMe read-intensive (1 DWPD)	960GB, 1.92, 3.84, 7.68TB
Enterprise NVMe mixed use (3 DPWD)	800GB, 1.6, 3.2, 6.4, 12.8TB

There is no industry standard for SSD capacities. Different vendors may have different size SSDs due to NAND die from other vendors being different physical sizes and different channels per SSD controller. A 32GB die from JEDEC standard is not 32GB, not 32GiB, but more because there are a certain number of planes, erase blocks per plane, and NAND pages per erase block, with many redundant blocks. Hyperscale datacenter customers and OEMs (original equipment manufacturers) that consume SSDs have primarily enabled the consolidation of capacity points to the table above, despite no industry standard.

Summary

1.04 is an inspiring release, especially for those who have seen the plotter back in the Chia alpha. The team has come a long way to make the plotting processes require fewer resources and become much more accessible to hardware of all shapes and sizes. I believe a new value plotting system will emerge in the next few weeks as a clear winner of TB/day/$. In the meantime, watch the Chia Reference Hardware wiki for updated plotting benchmarks!.

What happened under the hood – 1.04 Chia Proof of Space Code changes.

For those brave enough to want to understand what the plotting code is actually doing and the changes associated with 1.04, see the updates that made the major impacts on plotting performance as well as comments from the Chia developers. I had the pleasure of chatting directly with Rostislav (the Chia developer behind the recent changes) about the new 1.04 plotting improvements. Here are the changes for those who want to dive deep into the plotting code from the man himself Rostislav, @cryptoslava on keybase

Some of you may remember the time last year when plotting a k=32 required about 600 GiB of temp space. Around September, Mariano implemented a much better sorting algorithm that reduced the total IO and improved performance. Around this same time, there was a change in phase 1 where we started to drop entry metadata (f and C values) immediately after writing the next table instead of postponing it to phase 2. This resulted in a reduction of temp space to about 332 GiB. Another change made back in September was that some previously re-written data in place is now written to a new location. However, we’ve continued to use the maximum entry size (which was necessary for the in-place updates), even though it was not necessary anymore. I went through all such cases in the current set of changes where an entry size was excessive and reduced it to the necessary minimum. This brings both RAM usage and temp space reduction, but that’s not the whole story yet!
Rostislav

Another significant change was a reduction in sizes of sort_key and new_pos fields in phases 2 and 3 from k + 1 to k bits. The story is related to the old backpropagation (phase 2) algorithm, which was not using bitfield. In that case, it is possible to have more than 2^k entries in a table, so it was necessary to have an extra bit available for indexing them. Now it’s not required anymore so that extra bit is removed. Due to entry sizes being rounded up to whole bytes, this reduction is significant for k=32 plots, as, e.g., entries sorted in phase 3 can be reduced from 9 to 8 bytes! These changes have also made memory usage in phase 3 more efficient as now we use more buckets for sorting than before. Earlier, this extra bit was always 0, and since this was the most significant bit, which determines the bucket in sorting, we were only using half of the available buckets, and each bucket was 2x bigger, leading to a higher minimal memory requirement.
Fun fact: it was phase 3 requiring the most memory for sorting before (and determining the minimum necessary memory), now it’s phase 1
Rostislav

Finally, another change is already included in the 1.0.3 release but especially relevant together with the recent changes, where there were a few cases where we were only using half of the available memory buffer instead of using all of it
Rostislav

Use all allocated memory for sorting in phases 3 and 4

Features Image: Photo by Florian Krumm on Unsplash

5 thoughts on “Chia plotting improvements in version 1.04”

Pingback:NUC Small Form Factor Chia Plotting Build -
Jeff K
April 21, 2021 at 2:41 ams
Why do you not estimate any increase in TB / day plotted with the Budget build?
1. Storage_jm
  April 23, 2021 at 6:31 pms
  you could, but then you hit a DRAM bottleneck very fast and have to go to 64GB (no more budget). the best way is to actually just keep 8 process and 32GB DRAM, and reduce SSD temp space to 2x 1TB or 1x 2TB NVMe!
Pingback:Build a Budget Chia Cryptocurrency Plotting Rig -
tipsywolf
May 6, 2021 at 3:06 ams
very important thing you haven’t mentioned here – how much internet traffic does it use ?

Comments are closed.

Chia plotting improvements in version 1.04