Plotman – Automate and Manage Chia Plotting

Plotman – Automate and Manage Chia Plotting

Guest post by Eric Altendorf, author of plotman

What is Plotman?

Plotman is a tool for managing Chia plotting operations.  It allows you to configure parameters for scheduling plot jobs, typically in a staggered, never-ending, sequence.  It will load-balance plot jobs over multiple temp (-t) and destination (-d) drives.  It offers facilities for transferring completed plots from a plotting machine to a farming machine (an operation called “archiving”).  It also offers some tools for observing plot job progress and facilitates manipulation (e.g. suspend / resume / kill) of plotting job processes.

Plotman is a power-users tool that may make certain kinds of plotting automation easier.  It is not an automatic tool to make plotting easier for beginners.  It is strongly recommended that before using Plotman, you first get experience running plot jobs manually to understand how plotting works and how your hardware responds.

Plotman is not part of the core Chia code base or officially associated with the Chia Network.  It is very much an alpha piece of software, a work in progress, some aspects of how it work are mere historical accidents, and what is written here now may not apply in three months.  It is developed entirely by volunteers in their spare time (they all have day jobs).  It is developed on Linux but appears to work well on MacOS, and some folks have run it under WSL on Windows. 

Plotman: Motivation & Mode of Operation

Plotman was designed based on a few assumptions.  These assumptions may or may not be optimal, but they drove the initial design.

  • Different stages of plotting use different mixes of resources (disk IO, space, memory, CPU), but these resources are fixed.  Regular, even staggering of jobs should smooth out the resource demands to be constant over time.  This will facilitate optimal full utilization of resources.
  • Plotting should be scheduled based on when the hardware is ready to accept another plot job.  Although this might simply be a fixed time delay, in many cases it’s desirable to condition it on other criteria, such as the total number of jobs running, the number of jobs running on a particular tmp dir, or how long it’s been since the last job was started, globally, or on that tmp dir.
  • Scaled plotting operations will typically utilize one or more plotting machines, separate from the one or more farmers/harvesters.  Decoupling the plotting operation from the storage of plots on farming machine(s) provides flexibility and robustness.  Therefore, it is helpful to plot to a local drive and then use an asynchronous and customizable process to transfer or “archive” these plots to farming locations.
  • For robustness, one should minimize dependency on maintaining state or deep coupling to core Chia code.  Plotman should be stateless, able to operate given only the knowledge the operating system has of the currently running plotting jobs.  Plotman inspects OS process tables and open file tables in order to find running plot jobs, and to find their open files.  Plotman-initiated plot jobs redirect output to logfiles, which allows Plotman to also inspect the logfiles to determine job progress.

The primary use of Plotman is plotman interactive , which performs the following functions:

  • Monitor system status, current plotting jobs and their progress, and spawn new plot jobs according to parameters configured
  • Monitor completed plots on the local plotter machine, and archive them to remote farmers (optional)
  • Display system state — status of current plot jobs, and temp, destination, and archive directories

You can also use plotman plot to only run plot job spawning, plotman archive to only run archiving.  There are also additional command line tools for inspecting and manipulating running plot jobs.

For complete and up to date information on plotman command capabilities, use the built-in command line help option, plotman –help.

Installing Plotman

Installation for Linux:

  1. Plotman assumes that a functioning Chia installation is present on the system. Activate your chia environment by typing source /path/to/your/chia/install/activate.
  2. Then, install Plotman using the following command:
    > pip install –force-reinstall git+https://github.com/ericaltendorf/plotman@main
  3. Plotman will look for plotman.yaml within your computer at an OS-based default location. To create a default plotman.yaml and display its location, run the following command:
    > plotman config generate
    The default configuration file used as a starting point is located here
  4. That’s it! You can now run Plotman by typing plotman version to verify its version. Run plotman –help to learn about the available commands.

Configuring & Running Plotman Interactive

Dirs & Drives

Plotman assumes you configure one or more “tmp” dirs (the fast directories used for -t) and one or more “dst” dirs (where the Chia plotter will emit finished plots, i.e., the -d dir).  Note that if you use the archiving functionality, the “destination” dirs are not final destinations, but rather just a buffer where the plots sit until the archiving job can move them to the farmer.

Plotman schedules jobs depending both on global system status (e.g., how many jobs are already running, when the last-started job was started) as well as the status of the tmp drive being considered (e.g., how many jobs are already running on that tmp dir, how much progress they’ve made).

The normal use case is to set your tmp dirs as the mount points of the block devices (i.e., physical drives, or RAID devices) you’re using.  Most Plotman documentation assumes this use case.  However, this is not required, in some advanced situations there may be reason to treat multiple directories on one block device as separate logical tmp dirs.

In many situations, having a single dst drive (and dir) is sufficient.  If you have multiple drives, not only will you have a larger buffer for plotting, but Plotman will distribute plotting and archiving jobs in an attempt to avoid concurrent IO (an alternative to RAID’ing the drives).  If you have a collection of old 1TB or 2TB HDDs, that makes for a good set of dst dirs.

Scheduling

Plotman can be configured with a number of conditions for starting new plot jobs.  A job is started when all conditions are met.

Globally, you can configure the maximum number of plot jobs to be running at once, as well as a stagger parameter, which limits how quickly new jobs can be started.  Total max jobs is a useful way to limit the total memory used for Chia plotting.  Global staggering should be set to avoid jobs bunching up and starting all at once.  A good way to set an initial value is to decide how many jobs you wish to run in parallel, estimate how long they’ll take when running (in parallel, which is likely longer than when one runs by itself), and divide the values to determine frequency.  For example, if you expect to run 12 jobs in parallel, and you expect them to take 8 hours each, then a global stagger of 40 minutes (or a bit less) would be reasonable. 

Per tmp dir, there is also a max job limit.  This should be set based on your drive size and IO throughput.  E.g., a 1TB tmp drive can easily fit 3 staggered plot jobs, and probably 4 if staggered.  However, depending on your drive speed, 3 may be too many — for example, you might wish to run a SATA SSD with a max of two jobs.

Staggering on tmp dirs is done by job progress rather than time.  This is intended to be more robust to variability in system performance than a fixed clock.  Progress is measured by Chia plotting phase (1, 2, 3, and 4) as well as “subphases”.  Subphases are Plotman nomenclature for Chia plotter progress, defined per phase as follows:

  • In phases 1-3, subphase 0 is a (typically brief) initialization
  • In phase 1, subphases 1-7 correspond to computing tables 1 through 7
  • In phase 2, subphases 1-6 correspond to backpropagating on tables 7 down to 2
  • In phase 3, subphases 1-6 correspond to compression of tables {1,2} through {6,7}
  • In phase 4, the entire operation is considered “subphase 0”

In Plotman, progress is described by phase:subphase indicators (also sometimes described as phase major:minor).  For example, a job might be shown to be in phase 3:4, which would correspond to phase 3, compression of tables 4 and 5.

Staggering within tmp dirs are done by job progression as measured by phase.  The basic idea is that the next job should not be started on the tmp dir until the previous job has reached a certain point in its progression.

It is reasonable to ask whether within-tmp-dir staggering could be done by time, and whether global staggering could be done by phase progress.  These are things the Plotman devs are considering.

Archiving

Many users choose to ignore the archiving operation and plot directly to their farming drives.  This is fine; to do this simply comment out the lines in the config for the archiving section.

Currently, configuring archiving correctly is tedious and error-prone.  We are working on improving that.  There is a guide on the Plotman wiki describing how to configure archiving here: https://github.com/ericaltendorf/plotman/wiki/Archiving .

Running

The first time you run Plotman, you will need to create a config file.  You can do this with plotman config generate.  You can then edit the config in the location described.

After configuring, you’re ready to start plotting.  Run plotman interactive, and you should see an overview screen.  Assuming that no plotting jobs are already running, plotman should detect the machine is ready to plot and kick off a job.  As long as you leave this running, plotman will continue to kick off new plot jobs when the machine is ready (according to your configuration).  These plot jobs, after being initiated, are independent of plotman and should run to completion.  If you want to pause or stop the creation of new plots, you can hit the ‘p’ key or simply quit plotman (‘q’ or ^C).

The `plotman interactive` screen

When running plotman interactive, the screen shows the following information:

The first line shows the status. The plotting status shows whether we just started a plot, or, if not, why not (e.g., stagger time, tmp directories being ready, etc.). Archival status says whether we are currently archiving (and provides the rsync pid) or whether there are no plots available in the dst drives to archive.

The second line shows a snapshot graphical view of current plot job progress.  Each job is rendered on a progress bar with milestones for phases 1, 2, 3 and 4.  Phase/subphases with a single job in them show a ‘.’ character, those with 2, ‘:’, three ‘;’, and four or more ‘!’.  This is an easy way to quickly see the current state of plotting, how many jobs are running at what progres and whether they’re distributed evenly or are bunched up.

The third line provides a key to some directory abbreviations used throughout. For tmp and dst directories, we assume they have a common prefix, which is computed and indicated here, after which they can be referred to (in context) by their unique suffix. For example, if we have tmp dirs /mnt/tmp/00, /mnt/tmp/01, /mnt/tmp/02, etc., we show /mnt/tmp as the prefix here and can then talk about tmp dirs 00 or 01, etc. The archive directories are the same except that these are paths on a remote host and accessed via an rsyncd module.

The next table shows information about the active plotting jobs. If you have many jobs, it may be abbreviated to show the most and least recently started jobs (the full list is available via the command line command plotman status). It shows various information about the plot jobs, including the (first 8 characters of the) plot ID, the directories used, walltime, the current plot phase and subphase, space used on the tmp drive, pid, etc.

The next tables show the usage of the tmp and dst dirs.  The tmp tables show the phases of the plotting jobs using them, and whether or not they’re ready to take a new plot job. The dst table shows how many plots have accumulated, how much free space is left, and the phases of jobs that are destined to write to them, and finally, the priority computed for the archive job to move the plots away.

The last table simply shows free space of drives on the remote harvester/farmer configured as the archive destination.  This information is obtained via df over ssh, so for this to work you need passwordless ssh configured to the remote harvester/farmer.

Finally, the last section shows a log of actions performed — namely, plot and archive jobs initiated. This is the one part of the interactive tool which is stateful. There is no permanent record of these executed command lines, so if you start a new interactive plotman session, this log is empty.

Using Plotman Command Line

Plotman offers a few command line tools:

  • status – show a list of currently active plotting jobs
  • details – show details of a currently active plotting job, such as the arguments and the log file location
  • suspend – suspend a plotting job
  • resume – resume a suspended plotting job
  • kill – kill a plotting job and clean up (delete) its temp files

Commands that manipulate jobs take as an argument a prefix to the plot ID — the hex string that uniquely identifies the plot being created.  The 8-character prefix to the plot ID is displayed in Plotman tools, but when issuing a command you can use any uniquely identifying prefix.

The command line tools facilitate scripting operations which may be useful in unusual circumstances.  For example, if you have a temp drive /tmp/03 which is getting dangerously full, you might want to pause all jobs on it:

for id in `plotman status | tail -n+2 | grep /tmp/03 | cut -c1-8` ; do plotman suspend $id ; done

After which you could resume a job that is about to complete, or kill a job just getting started.

Plotman Analyze

Running plotman analyze on a set of log files will compute and show statistics about the time taken in each phase.  This analysis is fairly rudimentary, but it is a quick and convenient way to check on the performance of a certain set of jobs.

Plotman on Mac and Windows

Plotman is primarily developed on and for Linxu, but should work out of the box on a Mac, and several folks have had success running it under WSL (Windows Subsystem for Linux) on Windows.  If you run into issues, check on the discussion forums; there are likely others who have probably seen and solved them before. 

Problems?

Plotman is developed entirely by volunteers who have day jobs, so we make no guarantees of support.  We do try our best to help, but we’re also trying to set up the mechanisms for people in the community to help each other.

We are still working out the best system, but currently the most active spot for discussing Plotman use (including trouble getting started) is the keybase discussion channel #plotman on the chia_network Keybase team.  We are also experimenting with the Github “Discussions” section at https://github.com/ericaltendorf/plotman/discussions .

If you find issues with Plotman, please report them on github at https://github.com/ericaltendorf/plotman/issues .  Please try to report bugs related specifically to Plotman behavior and responsibilities, as opposed to issues with the core Chia plotter.

Contributing to Plotman

We welcome contributions; Plotman wouldn’t exist if it weren’t for the work of volunteers.  One challenge is that every person’s plotting situation is unique, so there are often suggestions for specific features to support specific use cases.  To maintain simplicity, ease of use, and reliability, the maintainers often attempt to recast specific features into more general forms.

It will be helpful for contributors to:

  • Discuss significant changes with the devs first, so we can coordinate on other changes that may already be planned.  You can find the devs on the #dev channel on the chia_plotman team on Keybase.  (Please do not use that channel for reporting problems or asking for support, however.)
  • If you’re changing multiple things, please separate them into separate PRs, so disagreements on one don’t hold up submission of others
  • Please base changes against the development branch

Tipping Plotman Devs

If you appreciate Plotman, you are welcome to send tips our way.

This address is controlled by Eric Altendorf and tips sent here will be shared with established, core Plotman devs:

xch1a94rytww4alzgaue3demjn7msjegagjtwgk058ck2lgzukr0gr6qvenh54

I also suspect that Plotman could be useful for commercial scale plotting operations; if you are running a scaled operation and have custom requirements, you are welcome to come discuss with the dev team.

Plotman’s Future

Plotman started as a hacked-together set of scripts that barely worked to get one person’s (my) machines plotting.  We are still slowly working through issues, from the superficial to the fundamental, that limit Plotman’s robustness and ease of use, and in some cases, are simply historical accidents of early development which deserve reconsideration.

In the near term, we aim to simplify scheduling configuration, as we believe there are more options than are really useful.  We also aim to generalize archiving functionality to use any transport protocol you like (rsync, rclone, scp, mv, etc.).  There’s also a lot of basic internal cleanup to do — code hygiene, tests, and refactoring.  There are some “basic” functionality that one might expect that is not yet written — for example, Plotman has no clean shutdown when drives fill up (what, you’re not just buying more drives??).

Longer term, it would be nice to make configuration a bit more automatic, so Plotman can dial itself to maximize plotting throughput.  We’d like to have better monitoring and reporting of plotting operations  There is also a possibility of a deeper integration with the official Chia tools.

I never thought Plotman would be so popular I’d be writing an article on it.  But here we are.  The past year has been crazy, and I cannot guess where or what Plotman will be a year from now.

Thanks for reading, and keep farming!

Note From JM – Special thanks to Eric for this post! Eric has been involved with Chia as long as I have, and he has spent countless hours helping community members in the Keybase. Here is a link to the plotman video I did on YouTube with the NUC!

14 thoughts on “Plotman – Automate and Manage Chia Plotting

  1. When I try to install plotman using the command listed above, I get:
    ERROR: Invalid requirement: ‘–force-reinstall’
    And When I enter command “plotman config generate”, I get:
    ModuleNotFoundError: No module named ‘readline’

    1. There’s just a missing ‘-‘ for the pip command line option.
      Try “–force-reinstall” instead.

  2. Did you use cpufreq-set on this NUC? In the previous blog post you said to set 4.7GHz on all cores but this causes my NUC to crash, probably because of overheating. Do you or does anyone else here know the highest safe (non-crashing) GHz value to set?

  3. Will Plotman support pooling-plots? If so, how long after the chia-update will it take for plotman to support the new plots?

  4. I have an issue where plotman has every job doubled up. For instance, to actually get 7 jobs going I have to set plotman to 14… Doesn’t seem right. Any idea what could be going on? The doubled jobs have the same plot id as their actual counterpart.

    1. I just noticed there is a reply option. Please have a look at my other post for some technical considerations.

  5. I am using JBODs with 36 HDD bays under Windows 10 Enterprise to store plots on remote machines. Will archiving work on a windows machine using rsync om plotman machine only? I really prefer Ubuntu for plotting but set up my JBODs using windows so any help would be appreciated.

  6. first of all, thank you for the kind of awesome video, and thank you for sharing your knowledge and experience with us. I faced an issue with the Plotman last night I don’t know if that common or something wrong with my setup. when I don’t keep the Plotman in interactive mode it not start another plot after 20 seconds from the finished plot or 60 min from the previse queued plot.

  7. @Barry
    Afaik, the first phase is laying out some starting tables on the buffer drive, which is usually an SSD if you go for high performance.
    So that means your CPU is doing calculations in RAM which then get written to your SSD.
    Usually your SSD is way slower than your RAM, so the whole system has to wait for reading and writing process on the SSD to finish, before it continues.
    While the data is flowing from RAM to SSD to vice versa, your thread is in idle. And so are all the other threads which use the same resource (the same SSD).

    On top of that, your RAM is also slower than your CPU caches, so that also creates delays.

    If you want to have the theoretical case where a CPU is working at 100%, you would need an algorith that uses so little data, that you can do without RAM and only calculate in the CPU cache.
    This is not possible for Chia plotting.

  8. Any chance you or Eric could explain with examples the correlation between phase_major:phase_minor and the phase limit? I am having a hard time making sense of it. I left my plotman.yaml file stock as regards to those entries and from what it says in the hashed out info above, I shouldn’t see another plot start until the previous one reaches phase 2:1, which is supposed to take precedence over the global stagger minutes, but I saw one kick off every hour until my max of 4 was reached. So I’m not sure what needs to be done to get the default action. I’m currently using a quad core CPU (soon to be 8 core) and have 4 drives of varying sizes.

    Scheduling:
    tmpdir_stagger_phase_major: 2
    tmpdir_stagger_phase_minor: 1
    tmpdir_stagger_phase_limit: 1
    tmpdir_max_jobs: 2
    global_max_job: 2
    global_stagger_m: 60
    polling_time_s: 20

  9. I cant make plotman start plottin. it just spaming: …sleeping 20 s: (True, ‘Starting plot job: chia plots create -k 32 -r 2 -u 128 -b 4500 -t /mnt/ssd -d /mnt/farm -f 8def26e7477fcabdfac1611f9bde5bb2ecd1d43f2cbb36080d290bbf07b3a89c1f24812ce9c32ae2f54857656b969296 -p 82e55a122dbd557733b7de186ecf3aa0d2effadc24b7be7ff55394901dba210c27654995eabc934be780b186c774f976 ; logging to /mnt/log/2021-05-24T21_59_38.346401+00_00.log’)

    I allready changed the policys… I dont know what wrong, can someone help me?

  10. 159880 490994I genuinely enjoy your site, but Im having a difficulty: any time I load 1 of your post in Firefox, the center with the internet page is screwed up – which is bizarre. May possibly I send you a screenshot? In any event, maintain up the superior function; I undoubtedly like reading you. 921538

Comments are closed.