General troubleshooting guidelines

Controller

Startup issues

If you encounter any startup issues, please make sure to check the following :

  • An instance is not already running (adress already bound to port)
  • Nothing is running on ports you defined in the step.properties configuration file (default port 8080 for the http application and 8081 for the grid endpoint)
  • A MongoDB instance is running on the server and port you defined in the step.properties configuration file (default port 27017 and host is localhost)
  • You have sufficient memory to start the JVM
  • You're starting the controller from its bin/ folder
  • You've checked your logs for other typical errors (log/ folder)
  • You haven't edited the classpath or otherwise made any mistake while editing the start script
  • Java is installed on the system or that you've set the JDK variable inside the start script

Agent

Startup issues

If you encounted any startup issues, make sure to check the following :

  • You have sufficient memory to start the agent
  • You are starting the agent from its bin/ folder
  • You haven't edited the classpath or otherwise made any mistake while editing the start script.
  • You've checked your logs for other typical errors (log/ folder)
  • You haven't edited the classpath or otherwise made any mistake while editing the start script
  • Java is installed on the system or that you've set the JDK variable inside the start script*

Grid Port opening

It may be possible that the agent does not start because of the following reasons :

  • On the controller host, the CONTROLLER_GRID_PORT does not accept incoming connection
  • On the agent host, the CONTROLLER_GRID_PORT port does not accept outgoing connection

In any case, the agent log will display the following error message :

2018-05-25 15:10:52,362 ERROR [Timer-0] s.g.a.RegistrationClient [RegistrationClient.java:74] while registering tokens to http://CONTROLLER_HOST:CONTROLLER_GRID_PORT
javax.ws.rs.ProcessingException: java.net.SocketTimeoutException: connect timed out

In order to fix that issue, you need to :

  • make sure that no other process already use the CONTROLLER_GRID_PORT on the machines

open the machines necessary ports. For this, you can open a command prompt and execute the following command :

On the controller host:

  • Windows :

netsh advfirewall firewall add rule name="CONTROLLER_GRID_PORT" dir=in action=allow protocol=TCP localport=CONTROLLER_GRID_PORT 

  • Linux: 

iptables -A INPUT -p tcp --dport CONTROLLER_GRID_PORT -j ACCEPT

On the agent host:

  • Windows :

netsh advfirewall firewall add rule name="CONTROLLER_GRID_PORT" dir=out action=allow protocol=TCP localport=CONTROLLER_GRID_PORT

  • Linux :

iptables -A OUTPUT -p tcp --sport CONTROLLER_GRID_PORT -j ACCEPT

If the connection has been successfully established, you should see the agent in the Controller "GRID" tab :

Grid.PNG

Agent Port opening

Once a connection is successfully initiated between Controller and Agent using the CONTROLLER_GRID_PORT, a new one is established between them in order to be used for the futures tests execution. This second connection is by default established on a random port and could create the same connectivity issue mentioned in the Grid Port opening section.

If the controller is then not able to communicate with the agent on the AGENT_PORT, following error message will occur in the Controller logs during a test execution :

2018-05-25 16:06:37,312 WARN [QuartzScheduler_Worker-3] s.g.c.GridClient [GridClient.java:141] Error while reserving session for token 07f5e5d8-4d37-40f0-a47f-68578c9624f4. Returning token to pool. Subsequent call to this token may fail or leaks may appear on the agent side.
step.grid.client.GridClient$AgentCommunicationException: Error while calling agent AgentRef [agentId=bc6bbe07-a28e-4d52-8ebe-54e6781db834, agentUrl=http://AGENT_HOST:AGENT_PORT] to execute /reserve on token Token [id=07f5e5d8-4d37-40f0-a47f-68578c9624f4]: java.net.SocketTimeoutException: connect timed out

In order to fix that issue, you need to :

  • choose a fixed port for the AGENT_PORT. You can do this by adding the "agentPort" property in the agent configuration file AgentConf.json :

    {

    "gridHost":"http://CONTROLLER_HOST:CONTROLLER_PORT",

    "agentPort":AGENT_PORT,
    "registrationPeriod":1000,
    ....
    }

  • open the machines necessary ports. For this, you can open a command prompt and execute the following command 

    On the agent host:

    • Windows :

    netsh advfirewall firewall add rule name="AGENT_PORT" dir=in action=allow protocol=TCP localport=AGENT_PORT

    • Linux :

    iptables -A INPUT -p tcp --dport AGENT_PORT -j ACCEPT

    On the controller host:

    • Windows :

    netsh advfirewall firewall add rule name="AGENT_PORT" dir=out action=allow protocol=TCP localport=AGENT_PORT

    • Linux :

    iptables -A OUTPUT -p tcp --sport AGENT_PORT -j ACCEPT 

    Specific error messages

    Timeout while processing request

    The following message suggests that a keyword execution lasted longer than the authorized duration and as a result, a timeout occurred on the controller's side:

    1527258710359-314.png

    By default, the maximum execution time is set to 180 seconds upon Keyword creation / configuration. If you expect your Keyword execution to be longer than 180 seconds, you may want to adjust this value by opening the configuration pane of your Keyword and modifying the "Call timeout" parameter:

    1527259022785-273.png
     

    However, if you're expecting for your keyword to always finish within this time limit, you're most likely running into an inner timeout within your script. This has to be addressed by the person responsible for the code of the Keyword (i.e the developper). The best way to troubleshoot such an issue is to provide the developper with the inputs and execution context of the keyword so as to be able to reproduce it in their development environment (via J/N-Unit tests).

    If it is not immediately clear to the developer what the root cause of the extended duration is, further investigation could be done directly on the agent side after adding traces and redeploying the Keyword. Such traces could take the following forms:

    • Screenshots (in the case of Selenium) or additional output information and attachments can be taken and added at the end or beginning of each step and interpreted at the end of the keyword's execution. Make sure to increase the timeout value first, otherwise the timeout will prevent for the information to be reported due to the interruption on the controller's side.
    • STEP's Measurement API can be used to investigate the duration of every section of code within the keyword and pinpoint the exact test step causing the delay

    Using an additional "debug" parameter within the keyword's logic may be a good option in order to switch these traces on and off. Also, installing a dedicated debug agent, to which debug executions could be specifically routed, is a good idea. That way, the developer could follow the execution of a keyword in real time (provided there are some non-headless events to watch and follow) and as part of a complete test scenario.

    Not able to find any agent token matching selection criteria

    If more keywords are executing concurrently than the number of agent tokens available at that time, you will most likely run into the following error message:

    Error: Not able to find any agent token matching selection criteria $agenttype=default and #THREADID#=^30$ (optional) and accepting attributes {} . Check the attachments for more details.

    In order to solve this problem, you will have one of the following choices to make:

    • install new agents or increase the token capacity of one or more of the existing agents
    • reduce the number of active threads originating from your tests
    • reduce the number of concurrent tests altogether

    Error while calling agent AgentRef to execute /release on token

    The following message can occur when making use of Session objects:

    step.grid.client.GridClient$AgentCommunicationException: Error while calling agent AgentRef [...] to execute /release on token Token [...]: java.net.SocketTimeoutException: Read timed out

    These circumstances usually involve one of the following scenarios:

    • The agent host is overloaded and is unable to complete the release process in time
    • The session cleanup phase takes unusually long or stalls entirely (ie. calling the close() method on every object stored in the session)
    • The agent is suddenly unreachable at the time of release (general Socket timeout issue)

    After making sure that the agent is not overloaded (cpu, memory, i/o), you will want to investigate the close() method of the objects you've stored in session (for instance, the wrapper of your Selenium driver).

Tags:
Created by Jerome Brongniart on 2019/03/11 13:44
     
Copyright © exense GmbH
v1.0