evok still not stable, inputs will not cause events after some time, restart required



  • yesterday at last i had enough time to look into this topic again.

    the situation for about 8 weeks is that my neuron 203 with 2 xs30 and one xs40 using evok is not stable for daily usage without restarting evok.service hourly via cron and occasionally manual restarting neurontcp.service.
    even then sometimes some inputs would not register and send events neither to evok-webapp nor to my openhab2 bridging script.
    which inputs are affected is completely random or at least i see no pattern.
    sometimes the affected inputs are on the 203 itself, sometimes on the extensions.
    sometimes all inputs on a bank are affected, sometimes just 2 or 4.
    the hardware itself works as i see the control leds on the neuron & extensions reacting to the inputs.
    ofc my WAF (women acceptance factor) is low as she sometimes has to use her phone to switch on the bathroom lights in the morning routine ... :/

    yesterday i made a complete new image with a fresh install of evok v.2.0.7 and didnt reimplement the cron-based restarts yet. i'd hoped that the situation software-wise would have improved.
    at least the switches/inputs we used worked for half a day but today some inputs wouldnt work again. again restarting evok would fix the situation.

    i see no hints for the cause of evoks behavior in any logfile whatsoever.
    what should i do to debug ?

    root@drHouse:~# cat /etc/evok.conf
    #!!! Do not use '#' for comments !!!
    
    [MAIN]
    config_version = 2.5
    use_schema_verification = False
    log_level = DEBUG
    log_file = /var/log/evok.log
    port = 3180
    webhook_enabled = False
    webhook_address = http://127.0.0.1:80
    webhook_device_mask = ["input","wd"]
    webhook_complex_events = False
    wifi_control_enabled = False
    soap_server_enabled = False
    soap_server_port = 8081
    
    [NEURON_1]
    global_id = 1
    allow_register_access = False
    scan_frequency = 20
    scan_enabled = True
    
    [EXTENSION_1]
    global_id = 2
    device_name = xS30
    modbus_uart_port = /dev/extcomm/0/0
    address = 1
    scan_frequency = 20
    
    [EXTENSION_2]
    global_id = 3
    device_name = xS30
    modbus_uart_port = /dev/extcomm/0/0
    address = 2
    scan_frequency = 20
    
    [EXTENSION_3]
    global_id = 4
    device_name = xS40
    modbus_uart_port = /dev/extcomm/0/0
    address = 3
    scan_frequency = 20
    

  • administrators

    @tja And the restart of evok fixes it or you also have to restart the modbus service? Could you provide the log from evok (/var/log/evok.log) and also the state of modbus TCP server (systemctl status neurontcp) when it stops working?



  • @tomas_hora
    hi tomas,

    as i said evok.log shows nothing.
    here's the tail part of the start on the morning till around half past 8 as i saw that one switch/input would not do anymore and i restarted evok:

    2018-06-22 07:03:06,564 - evok - INFO - Alias loaded: <neuron.Relay object at 0x758d5f90> al_lights_bedroom
    2018-06-22 07:03:06,972 - evok - INFO - Alias loaded: <neuron.Relay object at 0x75918650> al_lights_kitchen
    2018-06-22 07:03:06,973 - evok - INFO - Alias loaded: <neuron.Relay object at 0x758d5f90> al_lights_bedroom
    2018-06-22 07:03:07,054 - evok - INFO - Alias loaded: <neuron.Relay object at 0x75918650> al_lights_kitchen
    2018-06-22 07:03:07,054 - evok - INFO - Alias loaded: <neuron.Relay object at 0x758d5f90> al_lights_bedroom
    2018-06-22 07:03:07,102 - evok - INFO - Alias loaded: <neuron.Relay object at 0x75918650> al_lights_kitchen
    2018-06-22 07:03:07,102 - evok - INFO - Alias loaded: <neuron.Relay object at 0x758d5f90> al_lights_bedroom
    2018-06-22 08:41:42,130 - evok - INFO - Shutting down
    2018-06-22 08:41:44,272 - evok - INFO - Starting using config file /etc/evok.conf
    

    i will check neurontcp next time and report back.

    tia,tja...


  • administrators

    What's the speed of the inputs? When it stops reacting can you trigger an input manually, i.e. by briefly connecting 24V to the input?

    I wonder if it couldn't be due to the scanning frequency. At least in part because the LEDs light correctly and there are no errors. I don't believe anyone else has encountered issues like you describe before.

    Could you try restarting OpenHab too? It's possible there's a hung thread somewhere perhaps.



  • @tomasknot
    the inputs connect to plain wallmounted switches.
    the length of the impulse is determined by the length of the switch press - and even if i press for half a minute evok in this "error" state will not recognize this/these input(s) any more.

    as todays error happened i fetched my notebook to check for evok-webapp and it wouldnt show the keypress either.
    as it happends it was the input for the light in the technics-room where the cabinet with the neuron is located so i could look for the led the very moment i could check evok-webapp so i can be sure that the input got signal (the led lights) and therefore the switch is not the cause.

    i always use evok-webapp to check the error to rule out openhab and my bridge script.

    openhab is not in the game as i relocated openhab plus my bridge script to another rpi3 as evok plus openhab on the neuron would cause too much delay on the inputs.
    since i put openhab on another machine and used scan_frequency = 20 at least the switches would be much more usable.


  • administrators

    I really am not sure how this happens. I think it may be due to system load, but since as you say OpenHab is not on the system I am not quite sure why this is. Could you please run

    ps -fax
    

    when the system hangs, and post the output here? It could show us the system load in that case.



  • @tomasknot said in evok still not stable, inputs will not cause events after some time, restart required:

    I really am not sure how this happens. I think it may be due to system load, but since as you say OpenHab is not on the system I am not quite sure why this is. Could you please run

    ps -fax
    

    when the system hangs, and post the output here? It could show us the system load in that case.

    i took the opportunity to make a walk through the house checking all the switches and sadly (from a debugging point of view ofc :D ) all switches work atm.
    i will report any new problems and attach evok's logs, neurontcp's status and ps fax asap.

    in the meantime, thx for your efforts to help me. i would really like to put this bug ?!? to rest.

    tia,tja...


  • administrators

    @tja said in evok still not stable, inputs will not cause events after some time, restart required:

    i took the opportunity to make a walk through the house checking all the switches and sadly (from a debugging point of view ofc :D ) all switches work atm.
    i will report any new problems and attach evok's logs, neurontcp's status and ps fax asap.

    in the meantime, thx for your efforts to help me. i would really like to put this bug ?!? to rest.

    tia,tja...

    It's really an odd issue. I hope we can get down to the ultimate cause, it definitely looks like a bug. It's just odd that no-one else has reported the same problem before, either here on the forum or directly via email (which is, statistically speaking, the more frequent method of asking for support).



  • @tomasknot
    another question tomas:
    if i understand correctly the extensions or the daughterboards == input banks have their own firmware ... is there a way to check the firmware version and correctness of all the parts on a running neuron (eg without shutting down and/or have to use some windows machine etc etc) ?

    tia,tja...


  • administrators

    It is possible, but it requires using our image instead of Raspbian. You should be able to use /sys/devices/platform/unipi_plc to find out more information about the system, or simply run "sudo dmesg" after boot, as the image driver performs automatic diagnostics.