PAR Technologies V5 Washer User Manual


 
Problem: cannot start parallel task
30 ParaStation5 Administrator's Guide
Or logged on to this node, run psiadmin which also starts up the ParaStation daemon psid. See
Section 6.1, “ Problem: psiadmin returns error ” for more details.
Check the logfile /var/log/messages on this node for error messages. Verify that all nodes have an
identical configuration (/etc/parastation.conf).
6.3. Problem: cannot start parallel task
Problem: a parallel task cannot be launched, an error is reported:
PSI: PSI_createPartition: Resource temporarily unavailable
Check for available nodes and active parallel tasks. Check for user or group restrictions.
If the error
PSI: dospawn: spawn to node 1 failed.
PSE: Could not spawn './mpi_latency' process 1, error = Bad \
file descriptor.
is reported, check if the current directory holding the program mpi_latency is accessible on all nodes.
Verify that the program is executable on all nodes.
6.4. Problem: bad performance
Verify that the proper interconnect and/or transport is used: check for environment variables controlling
transport (see Section 5.8, “Controlling ParaStation5 communication paths” and ps_environment(5)).
Watch protocol counters, e.g. counters indicating timeouts, retries, errors or other bad conditions. For
p4sock, check recv_net_data and recv_user. See Section 5.2, “ParaStation5 protocol p4sock”.
Look for a crystal bowl!
Or contact <support@par-tec.com>.
6.5. Problem: different groups of nodes are seen as up
or down
Problem: depending on which node the psiadmin is run, different groups of nodes are seen as "up" or
"down".
Check for identical configuration on each node, e.g. compare the configuration file /etc/
parastation.conf on each node.
6.6. Problem: cannot start process on frontend
Problem: Starting a job is canceled giving the error message
Connecting client 139.27.166.22:44784 (rank 6) failed : Network is
unreachable
PSIlogger: Child with rank 12 exited with status 1.