Neuron unstable when accessing /sys



  • @knebb

    I'm running a larger testing script on 2 devices with 4 threads in parallel, so this may take a little while. Unfortunately it's difficult to know what goes wrong without the device kernel log. I mentioned the electrical side as we did have customers who had issues wth it, but usually it was more on the order of factories and such.

    If you can wait a little while longer I'll see where my testing takes me, I'll post here again in a few hours.



  • @TomasKnot
    I found some kernel related issues... see kernel.log I attached.
    Unfortunately it appears I have permission issues uploading. You can download the file here.



  • Another test:

    When I add a "sleep 1s" after every iteration (in while) the system rebooted this time after run 200 (instead of 20100 or so). So it seems to be related to some timing and not the number of accesses.



  • @knebb

    I'll have a look at it. It does look like there might be a timing/resource starvation issue somewhere, based on the kernel log as well (not a kernel panic, but scheduled thread fails to run in allotted time).

    I have gone over all resource allocations again, so at the very least we can rule out a memory leak.



  • It looks like the issue is with the invalidation thread stalling out if consecutive reads are done before it can be performed. I've switched it to use mutexes instead of spinlocks, which seems to solve the issue.

    I seem to recall I have already sent you a modified binary - would you be willing to accept one again? I would send it via a private message as before.



  • @TomasKnot

    Yes, you already send one. It is fine.

    Looking forward to have a stable system soon. Luckily it is not a hardware fault.

    Thanks for great support!



  • I ran the script and up to now it is at 43400- so far nearly 50% more than before. No crash or reboot up to now.

    I will start my monitoring system and see if it will stay stable.

    THANKS a lot!



  • Apologies for the trouble, we did not encounter this particular issue before.

    I hope your project goes well!



  • @TomasKnot

    Thanks again! currently set to minutely and uptime is at 15hrs.

    Looks it is really stable now.

    thanks again for the great support!



  • @knebb
    If you need faster response times on the SYSFS I can make that change specifically for you, but the limiting factor will be snmp anyhow. Currently SYSFS is set to refresh at a rate of 50Hz. Rates up to 1000Hz are possible in theory, at a cost of higher CPU use.



  • Ah, well. No I am absolutely fine with this.

    My Cacti monitors the system every 5 minutes. So no need for faster- I am fine with a minute.

    Thanks again!

    Oh, and it is working stable. Now running for nearly 2days without a reboot.