Full-report mode
Whilst padb can be used to collect very specific information from an application, unless you know what you are looking for or know the application very well this may not be what you want. For cases such as this padb has a "full report" mode in which it collects such information from a job as is likely to be useful. This will create a full diagnostic report for a given job iterating over the more common padb modes and options. If you are just starting out debugging with padb or are creating an error report for a third party then the full-report option is a good place to start. For large jobs this can generate a lot of output so redirecting to a file is recommended.
To run in this mode simply invoke padb with the option --full-report=<jobid>.
The full-report mode is also very useful if you are automatically creating trace files for later inspection or collecting information for inspection by a third party. End-users can be instructed to run it and mail a log back to a remote support team, for example or it can be integrated into automatic test suites.
More detailed information on using padb and about the type of information padb can collect about a job can be found on the modes page.
$ padb --show-jobs 45882 $ padb --full-report=45882 |
padb version 3.n (Revision 325) full job report for job 45882 ---------------- [0] ---------------- comm0: name: 'MPI_COMM_WORLD' comm0: rank: '0' comm0: size: '4' comm0: id: '0' comm0: Rank: local 0 global 0 comm0: Rank: local 1 global 1 comm0: Rank: local 2 global 2 comm0: Rank: local 3 global 3 comm1: name: 'MPI_COMM_SELF' comm1: rank: '0' comm1: size: '1' comm1: id: '0x1' comm2: name: 'MPI_COMM_NULL' comm2: size: '0' comm2: id: '0x2' comm3: name: 'MPI COMMUNICATOR 3 DUP FROM 0' comm3: rank: '0' comm3: size: '4' comm3: id: '0x3' comm3: Rank: local 0 global 0 comm3: Rank: local 1 global 1 comm3: Rank: local 2 global 2 comm3: Rank: local 3 global 3 comm4: name: 'MPI COMMUNICATOR 4 DUP FROM 0' comm4: rank: '0' comm4: size: '4' comm4: id: '0x4' comm4: Rank: local 0 global 0 comm4: Rank: local 1 global 1 comm4: Rank: local 2 global 2 comm4: Rank: local 3 global 3 comm5: name: 'MPI COMMUNICATOR 5 SPLIT FROM 3' comm5: rank: '0' comm5: size: '2' comm5: id: '0x5' comm5: Rank: local 0 global 0 comm5: Rank: local 1 global 2 ---------------- [1] ---------------- comm0: name: 'MPI_COMM_WORLD' comm0: rank: '1' comm0: size: '4' comm0: id: '0' comm0: Rank: local 0 global 0 comm0: Rank: local 1 global 1 comm0: Rank: local 2 global 2 comm0: Rank: local 3 global 3 comm1: name: 'MPI_COMM_SELF' comm1: rank: '0' comm1: size: '1' comm1: id: '0x1' comm2: name: 'MPI_COMM_NULL' comm2: size: '0' comm2: id: '0x2' comm3: name: 'MPI COMMUNICATOR 3 DUP FROM 0' comm3: rank: '1' comm3: size: '4' comm3: id: '0x3' comm3: Rank: local 0 global 0 comm3: Rank: local 1 global 1 comm3: Rank: local 2 global 2 comm3: Rank: local 3 global 3 comm4: name: 'MPI COMMUNICATOR 4 DUP FROM 0' comm4: rank: '1' comm4: size: '4' comm4: id: '0x4' comm4: Rank: local 0 global 0 comm4: Rank: local 1 global 1 comm4: Rank: local 2 global 2 comm4: Rank: local 3 global 3 comm5: name: 'MPI COMMUNICATOR 5 SPLIT FROM 3' comm5: rank: '0' comm5: size: '2' comm5: id: '0x5' comm5: Rank: local 0 global 1 comm5: Rank: local 1 global 3 ---------------- [2] ---------------- comm0: name: 'MPI_COMM_WORLD' comm0: rank: '2' comm0: size: '4' comm0: id: '0' comm0: Rank: local 0 global 0 comm0: Rank: local 1 global 1 comm0: Rank: local 2 global 2 comm0: Rank: local 3 global 3 comm1: name: 'MPI_COMM_SELF' comm1: rank: '0' comm1: size: '1' comm1: id: '0x1' comm2: name: 'MPI_COMM_NULL' comm2: size: '0' comm2: id: '0x2' comm3: name: 'MPI COMMUNICATOR 3 DUP FROM 0' comm3: rank: '2' comm3: size: '4' comm3: id: '0x3' comm3: Rank: local 0 global 0 comm3: Rank: local 1 global 1 comm3: Rank: local 2 global 2 comm3: Rank: local 3 global 3 comm4: name: 'MPI COMMUNICATOR 4 DUP FROM 0' comm4: rank: '2' comm4: size: '4' comm4: id: '0x4' comm4: Rank: local 0 global 0 comm4: Rank: local 1 global 1 comm4: Rank: local 2 global 2 comm4: Rank: local 3 global 3 comm5: name: 'MPI COMMUNICATOR 5 SPLIT FROM 3' comm5: rank: '1' comm5: size: '2' comm5: id: '0x5' comm5: Rank: local 0 global 0 comm5: Rank: local 1 global 2 ---------------- [3] ---------------- comm0: name: 'MPI_COMM_WORLD' comm0: rank: '3' comm0: size: '4' comm0: id: '0' comm0: Rank: local 0 global 0 comm0: Rank: local 1 global 1 comm0: Rank: local 2 global 2 comm0: Rank: local 3 global 3 comm1: name: 'MPI_COMM_SELF' comm1: rank: '0' comm1: size: '1' comm1: id: '0x1' comm2: name: 'MPI_COMM_NULL' comm2: size: '0' comm2: id: '0x2' comm3: name: 'MPI COMMUNICATOR 3 DUP FROM 0' comm3: rank: '3' comm3: size: '4' comm3: id: '0x3' comm3: Rank: local 0 global 0 comm3: Rank: local 1 global 1 comm3: Rank: local 2 global 2 comm3: Rank: local 3 global 3 comm4: name: 'MPI COMMUNICATOR 4 DUP FROM 0' comm4: rank: '3' comm4: size: '4' comm4: id: '0x4' comm4: Rank: local 0 global 0 comm4: Rank: local 1 global 1 comm4: Rank: local 2 global 2 comm4: Rank: local 3 global 3 comm5: name: 'MPI COMMUNICATOR 5 SPLIT FROM 3' comm5: rank: '1' comm5: size: '2' comm5: id: '0x5' comm5: Rank: local 0 global 1 comm5: Rank: local 1 global 3 Total: 10 communicators of which 0 are in use. No data was recorded for 24 communicators ----------------- [0-3] (4 processes) ----------------- main() at deadlock.c:42 locals MPI_Comm alpha = 'MPI COMMUNICATOR 3 DUP FROM 0' [0-3] MPI_Comm beta = 'MPI COMMUNICATOR 4 DUP FROM 0' [0-3] MPI_Comm * mb = '' [0-3] char * p = 'Address 0xffffffff out of bounds' [0-3] MPI_Comm split = 'MPI COMMUNICATOR 5 SPLIT FROM 3' [0-3] ----------------- [0-3] (4 processes) ----------------- PMPI_Barrier() at pbarrier.c:62 params MPI_Comm comm: 'MPI COMMUNICATOR 3 DUP FROM 0' [1-3] 'MPI COMMUNICATOR 4 DUP FROM 0' [0] locals int err = '0' [0-3] ----------------- [0-3] (4 processes) ----------------- ompi_coll_tuned_barrier_intra_dec_fixed() at coll_tuned_decision_fixed.c:206 params struct ompi_communicator_t * comm: 'MPI COMMUNICATOR 3 DUP FROM 0' [1-3] 'MPI COMMUNICATOR 4 DUP FROM 0' [0] mca_coll_base_module_t * module = 'valid pointer perm=rw-p ([heap])' [0-3] locals int communicator_size = '0' [0-3] ----------------- [0-3] (4 processes) ----------------- ompi_coll_tuned_barrier_intra_recursivedoubling() at coll_tuned_barrier.c:172 params struct ompi_communicator_t * comm: 'MPI COMMUNICATOR 3 DUP FROM 0' [1-3] 'MPI COMMUNICATOR 4 DUP FROM 0' [0] mca_coll_base_module_t * module = 'valid pointer perm=rw-p ([heap])' [0-3] locals int adjsize = '4' [0-3] int err = '0' [0-3] int line: more than 3 distinct values int mask: '2' [0-1] '4' [2-3] int rank: more than 3 distinct values int remote: '0' [1-2] '1' [0,3] int size = '4' [0-3] ----------------- [0-3] (4 processes) ----------------- ompi_coll_tuned_sendrecv_actual() at coll_tuned_util.c:54 params void * sendbuf = 'null pointer' [0-3] int scount = '0' [0-3] ompi_datatype_t * sdatatype = 'MPI_BYTE' [0-3] int dest: '0' [1-2] '1' [0,3] int stag = '-16' [0-3] void * recvbuf = 'null pointer' [0-3] int rcount = '0' [0-3] ompi_datatype_t * rdatatype = 'MPI_BYTE' [0-3] int source: '0' [1-2] '1' [0,3] int rtag = '-16' [0-3] struct ompi_communicator_t * comm: 'MPI COMMUNICATOR 3 DUP FROM 0' [1-3] 'MPI COMMUNICATOR 4 DUP FROM 0' [0] ompi_status_public_t * status = 'null pointer' [0-3] locals int err = '0' [0-3] int line = '0' [0-3] ompi_request_t *[2] reqs = '{, }' [0-3] ompi_status_public_t [2] statuses = 'value too long to display' [0-3] ----------------- [0-3] (4 processes) ----------------- ompi_request_default_wait_all() at request/req_wait.c:262 params size_t count = '2' [0-3] ompi_request_t ** requests: more than 3 distinct values ompi_status_public_t * statuses = 'valid pointer perm=rw-p ([stack])' [0-3] locals char [30] __PRETTY_FUNCTION__ = '"ompi_request_default_wait_all"' [0-3] size_t completed = '1' [0-3] size_t i = '2' [0-3] int mpi_error = '0' [0-3] size_t pending = '1' [0-3] ompi_request_t * request = 'valid pointer perm=rw-p ([heap])' [0-3] ompi_request_t ** rptr = '' [0-3] size_t start: '53' [0-1] '55' [2-3] ----------------- [0-3] (4 processes) ----------------- opal_condition_wait() at ../opal/threads/condition.h:99 params opal_condition_t * c = 'valid pointer perm=rw-p' [0-3] opal_mutex_t * m = 'valid pointer perm=rw-p' [0-3] locals int rc = '0' [0-3] ----------------- [0,3] (2 processes) ----------------- opal_progress() at runtime/opal_progress.c:206 locals int events = '0' [0,3] size_t i = '0' [0,3] ----------------- [1] (1 processes) ----------------- opal_progress() at runtime/opal_progress.c:181 locals int events = '0' [1] size_t i = '2' [1] opal_timer_t now = '135914459801112' [1] ----------------- [1] (1 processes) ----------------- opal_timer_base_get_cycles() at ../opal/mca/timer/linux/timer_linux.h:31 opal_sys_timer_get_cycles() at ../opal/include/opal/sys/ia32/timer.h:33 locals opal_timer_t ret = '135914459801112' [1] ----------------- [2] (1 processes) ----------------- opal_progress() at runtime/opal_progress.c:166 locals int events = '0' [2] size_t i = '2' [2] |