zzz.i2p

Development discussions
 
Sun, 13 Jun 2021, 07:24pm #1
jogger
I2P Legend

My monitoring for backlogged traffic as discussed earlier just stroke again after months of silence. Problem best reproduced after longer uptime on low end CPU with multiple I2CP connections and lots of sessions.

A ratchet session expires when it times out and begins after creation. While this sounds intuitive, the problem is the long time in between spending 100% CPU mainly in encryptNewSession(). During this time the thread cannot feed other threads, so average load from concurrent threads goes down a bit during this time.

Factor in context switching and it is likely that encryptNewSession() makes faster progress to the end. Add a concurrent encryptNewSession() starting some time later and this effect amplifies. Result: both calls complete nearer in time as they start, so the timeouts of the new sessions will be nearer as the previous ones.

With multiple I2CP connections this escalates over time to a point where the CPU is clogged up, the key factory runs dry and encryptNewSession() runs a lot longer because fresh keys additionally have to be made.

The key factory is too slow anyway and it is designed for a progressive slowdown under higher load. I took out all the slowdown and it still runs dry on my systems, just took 11 seconds at 100% CPU to fill up after 2 hours uptime.

Even on high end CPU the theoretical limit of 40 key pairs per second is too low for medium traffic. Anybody can mitigate this with the following advanced settings:
crypto.edh.precalc.delay=0
crypto.edh.precalc.min=25

Or try the upcoming I2Speed version with a bunch of crypto speedups including Curve25519 which is the CPU eater in this post.

Sun, 13 Jun 2021, 11:20pm #2
zzz
Administrator
Zzz

good point, may need to tweak the precalc settings, as the network rekeys

Mon, 14 Jun 2021, 12:16pm #3
zzz
Administrator
Zzz

I did a quick check of the stats on two routers and don't see much cause for concern, well under 1% empty for both XDH and EDH. I guess that's not really the best metric though, it's bad on slow boxes like ARM if it ever runs out, as you state. Perhaps we need to bump up the size on slow boxes.