To keep me off the streets, and hopefully help others understand my ramblings, I did more work on this.
Python 3.6.9 running on Redhat 6
Netmiko “show running-config”
either 100 threads or 100 processes
1369 devices (mix of ios, ios-xe, asa, nexus, asr)
Code will retry a host on particular known exceptions
Four scripts (gevent, multiprocessing, concurrent.futures, Nornir)
multiprocessing Pool, fast_cli - 1m 12s
multiprocessing Pool - 2m 13s
gevent Pool, gevent monkey patch, fast_cli login - 2m 40s
gevent Pool, gevent monkey patch - 3m 16s
concurrent.futures ThreadPoolExecutor, gevent monkey patch, fast_cli login - 2m 43s
concurrent.futures ThreadPoolExecutor, gevent monkey patch - 3m 23s
Nornir, gevent monkey patch - 3m 19s
All good up to here, no exceptions, full config, fast, and no need to close the connection within task or reduce the number of threads. Now let’s remove the gevent monkey patch and watch things slow down.
- Connections closed within task
concurrent.futures ThreadPoolExecutor - 6m 45s - 6 exception retries (from ASAs)
Nornir - 6m 50s - 9 exception retries (from ASAs)
- Connections not closed within task, so we can reuse in later tasks
concurrent.futures ThreadPoolExecutor - 26m - 176 exception retries
Nornir - 26m, 165 exception retries
Just for the fun of it, what was the fastest I could get the config from 1369 devices? 39s with 300 processes - multiprocessing Pool, fast_cli login.
I am hoping someone smarter than me can work out what is the best way to deal with the thread I/O blocking that appears to be happening with SSH. Until then I will just use the monkey patch on any Nornir scripts.
(Edited to fix some times)