• Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Neuron unstable when accessing /sys

    UniPi Neuron Series
    2
    23
    3921
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K
      knebb @TomasKnot last edited by knebb

      @TomasKnot
      Hi,

      first: yes it reboots for sure! I can not login for a minute or so and when back again the "uptime" states only 1 minute or so.

      dmesg only shows me the progress of the last boot, but not what happened before.

      Connected through ssh. No screen messages. No kernel panic to see. The scripts runs and suddenly does not print any output any more. Until the ssh connection appears to be broken.

      In kernel.log nothing to see:

      May 16 10:44:32 zentrale kernel: [    8.611983] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
      May 16 10:44:34 zentrale kernel: [   10.039003] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
      May 16 10:44:39 zentrale kernel: [   15.037087] random: crng init done
      May 16 11:01:34 zentrale kernel: [    0.000000] Booting Linux on physical CPU 0x0
      May 16 11:01:34 zentrale kernel: [    0.000000] Linux version 4.9.41-v7+ (dc4@dc4-XPS13-9333) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611) ) #1023 SMP Tue Aug 8 16:00:15 BST 2017
      May 16 11:01:34 zentrale kernel: [    0.000000] CPU: ARMv7 Processor [410fd034] revision 4 (ARMv7), cr=10c5383d
      May 16 11:01:34 zentrale kernel: [    0.000000] CPU: div instructions available: patching division code
      May 16 11:01:34 zentrale kernel: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
      

      Regarding a possible circuit/ electrical issue: Could indeed be possible. There is just one device attached (which I would like to count for the moment). But then it should happen as well when not accessing the /sys fs, shouldn't it? But when /sys is not accessed (or less frequent) the reboots do not happen at all (less frequently).

      Any further ideas?

      T 1 Reply Last reply Reply Quote 0
      • T
        TomasKnot @knebb last edited by TomasKnot

        @knebb

        I'm running a larger testing script on 2 devices with 4 threads in parallel, so this may take a little while. Unfortunately it's difficult to know what goes wrong without the device kernel log. I mentioned the electrical side as we did have customers who had issues wth it, but usually it was more on the order of factories and such.

        If you can wait a little while longer I'll see where my testing takes me, I'll post here again in a few hours.

        K 1 Reply Last reply Reply Quote 0
        • K
          knebb @TomasKnot last edited by knebb

          @TomasKnot
          I found some kernel related issues... see kernel.log I attached.
          Unfortunately it appears I have permission issues uploading. You can download the file here.

          K 1 Reply Last reply Reply Quote 0
          • K
            knebb @knebb last edited by

            Another test:

            When I add a "sleep 1s" after every iteration (in while) the system rebooted this time after run 200 (instead of 20100 or so). So it seems to be related to some timing and not the number of accesses.

            T 1 Reply Last reply Reply Quote 0
            • T
              TomasKnot @knebb last edited by TomasKnot

              @knebb

              I'll have a look at it. It does look like there might be a timing/resource starvation issue somewhere, based on the kernel log as well (not a kernel panic, but scheduled thread fails to run in allotted time).

              I have gone over all resource allocations again, so at the very least we can rule out a memory leak.

              1 Reply Last reply Reply Quote 0
              • T
                TomasKnot last edited by

                It looks like the issue is with the invalidation thread stalling out if consecutive reads are done before it can be performed. I've switched it to use mutexes instead of spinlocks, which seems to solve the issue.

                I seem to recall I have already sent you a modified binary - would you be willing to accept one again? I would send it via a private message as before.

                K 1 Reply Last reply Reply Quote 0
                • K
                  knebb @TomasKnot last edited by

                  @TomasKnot

                  Yes, you already send one. It is fine.

                  Looking forward to have a stable system soon. Luckily it is not a hardware fault.

                  Thanks for great support!

                  1 Reply Last reply Reply Quote 0
                  • K
                    knebb last edited by

                    I ran the script and up to now it is at 43400- so far nearly 50% more than before. No crash or reboot up to now.

                    I will start my monitoring system and see if it will stay stable.

                    THANKS a lot!

                    1 Reply Last reply Reply Quote 0
                    • T
                      TomasKnot last edited by

                      Apologies for the trouble, we did not encounter this particular issue before.

                      I hope your project goes well!

                      K 1 Reply Last reply Reply Quote 0
                      • K
                        knebb @TomasKnot last edited by

                        @TomasKnot

                        Thanks again! currently set to minutely and uptime is at 15hrs.

                        Looks it is really stable now.

                        thanks again for the great support!

                        T 1 Reply Last reply Reply Quote 0
                        • T
                          TomasKnot @knebb last edited by TomasKnot

                          @knebb
                          If you need faster response times on the SYSFS I can make that change specifically for you, but the limiting factor will be snmp anyhow. Currently SYSFS is set to refresh at a rate of 50Hz. Rates up to 1000Hz are possible in theory, at a cost of higher CPU use.

                          1 Reply Last reply Reply Quote 0
                          • K
                            knebb last edited by knebb

                            Ah, well. No I am absolutely fine with this.

                            My Cacti monitors the system every 5 minutes. So no need for faster- I am fine with a minute.

                            Thanks again!

                            Oh, and it is working stable. Now running for nearly 2days without a reboot.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post