Version 2 Release 4
GC23-3900-05
Program Number: 5765-529
| Note |
|---|
|
Before using this information and the product it supports, be sure to read the general information under "Notices". |
Fifth Edition (February 1998)
This is a major revision of GC23-3900-04.
This edition applies to Version 2 Release 4 of the IBM Parallel System Support Programs for AIX (PSSP) Licensed Program, program number 5765-529, and to all subsequent releases and modifications until otherwise indicated in new editions. Significant changes or additions to the text and illustrations are indicated by a vertical line (|) to the left of the change.
Order publications through your IBM representative or the IBM branch office serving your locality. Publications are not stocked at the address below.
IBM welcomes your comments. A form for readers' comments may be provided at the back of this publication, or you may address your comments to the following address:
If you would like a reply, be sure to include your name, address, telephone number, or FAX number.
Make sure to include the following in your comment or note:
When you send information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you.
© Copyright International Business Machines Corporation 1995, 1998. All rights reserved.
Note to U.S. Government Users -- Documentation related to restricted rights -- Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule contract with IBM Corp.
References in this publication to IBM products, programs, or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM's product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any of IBM's intellectual property rights may be used instead of the IBM product, program, or service. Evaluation and verification of operation in conjunction with other products, except those expressly designated by IBM, are the user's responsibility.
IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to:
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact:
Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee.
The following terms are trademarks of the International Business machines Corporation in the United States and/or countries:
Microsoft, Windows, and the Windows 95 logo are trademarks or registered trademarks of Microsoft Corporation.
UNIX is a registered trademark in the United States and other countries licensed exclusively through X/Open Company Limited.
Other company, product, and service names, which may be denoted by a double asterisk (**), may be trademarks or service marks of others.
This product includes software that is publicly available:
This book discusses the use of these products only as they apply specifically to the SP system. The distribution for these products includes the source code and associated documentation. (Kerberos does not ship source code.) /usr/lpp/ssp/public contains the compressed tar files of the publicly available software. (IBM has made minor modifications to the versions of Tcl and Tk used in the SP system to improve their security characteristics. Therefore, the IBM-supplied versions do not match exactly the versions you may build from the compressed tar files.) All copyright notices in the documentation must be respected. You can find version and distribution information for each of these products that are part of your selected install options in the /usr/lpp/ssp/README/ssp.public.README file.
The IBM Parallel System Support Programs for AIX: Command and Technical Reference provides detailed syntax and parameter information for all commands you can use to install, customize, and maintain the IBM RS/6000 SP system.
Other books that help you administer and use the SP system include:
This book applies to PSSP Version 2 Release 4. To find out what version of PSSP is running on your control workstation, enter the following:
SDRGetObjects SP code_version
In response, the system displays something similar to:
code_version PSSP-2.4
If the response indicates PSSP-2.4, this book applies to the version of PSSP that is running on your system.
To find out what version of PSSP is running on the nodes of your system, enter the following from your control workstation:
splst_versions -G -t
In response, the system displays something similar to:
1 PSSP-2.4 2 PSSP-2.4 7 PSSP-2.3 8 PSSP-2.3
If the response indicates PSSP-2.4, this book applies to the version of PSSP that is running on your system.
If you are running mixed levels of PSSP, be sure to maintain and refer to the appropriate documentation for whatever versions of PSSP you are running.
This book is intended for anyone not familiar with the syntax and use of the RS/6000 SP commands.
This book consists of three parts. Part 1 of this book is the Command Reference. It contains RS/6000 SP commands which are organized alphabetically. Part 2 of this book is the Technical Reference. It contains RS/6000 SP files, subroutines, and other technical information. Part 3 of this book is the Appendix. It lists Perspectives colors and fonts.
The back of the book includes a glossary and an index.
Vertical bars (|) to the left of the text in this book indicate changes or additions.
The commands in this book are in the following format:
This book uses the following typographic
conventions:
| Typographic | Usage |
|---|---|
| Bold |
|
| Italic |
|
| Constant width | Examples and information that the system displays appear in constant width typeface. |
| [ ] | Brackets enclose optional items in format and syntax descriptions. |
| { } | Braces enclose a list from which you must choose an item in format and syntax descriptions. |
| | | A vertical bar separates items in a list of choices. (In other words, it means "or.") |
| < > | Angle brackets (less-than and greater-than) enclose the name of a key on the keyboard. For example, <Enter> refers to the key on your terminal or workstation that is labeled with the word Enter. |
| ... | An ellipsis indicates that you can repeat the preceding item one or more times. |
| <Ctrl-x> | The notation <Ctrl-x> indicates a control character sequence. For example, <Ctrl-c> means that you hold down the control key while pressing <c>. |
In order to use the PSSP man pages or access the PSSP online (HTML) publications, the ssp.docs file set must first be installed. To view the PSSP online publications, you also need access to an HTML document browser such as Netscape. An index to the HTML files that are provided with the ssp.docs file set is installed in the /usr/lpp/ssp/html directory.
You can view this book or download a PostScript version of it from the IBM RS/6000 web site at http://www.rs6000.ibm.com. At the time this manual was published, the full path was http://www.rs6000.ibm.com/resource/aix_resource/sp_books. However, the structure of the RS/6000 web site can change over time.
Here are some related publications.
As an alternative to ordering the individual books, you can use SBOF-8587 to order the entire SP software library.
As an alternative to ordering the individual books, you can use SBOF-8588 to order the entire Parallel Environment for AIX library.
Here are some other IBM publications that you may find helpful.
Order according to RS/6000 SP Switch Router model:
You can order the RS/6000 SP Switch Router as the IBM 9077.
Here are some non-IBM publications that you may find helpful.
The following manual pages for public code are available in this product:
Manual pages and other documentation for Tcl, TclX, Tk, and expect can be found in the compressed tar files located in /usr/lpp/ssp/public.
This part of the book contains the RS/6000 SP commands.
To access the RS/6000 SP online manual pages, set the MANPATH environment variable as follows:
export MANPATH=$MANPATH:/usr/lpp/ssp/man
setenv MANPATH $MANPATH\:/usr/lpp/ssp/man
When you partition your system, you create one or more system partitions which, for most tasks, function as separate and distinct logical RS/6000 SP systems. Most commands function within the boundary of the system partition in which they are executed. A number of commands, however, continue to treat the RS/6000 SP as a single entity and do not respect system partition boundaries. That is, in their normal function they may affect a node or other entity outside of the current system partition. In addition, some commands which normally function only within the current system partition have been given a new parameter which, when used, allows the scope of that command to exceed the boundaries of the current system partition.
On the control workstation, the administrator is in an environment for one system partition at a time. The SP_NAME environment variable identifies the system partition to subsystems. (If this environment variable is not set, the system partition is defined by the primary: stanza in the /etc/SDR_dest_info file.) Most tasks performed on the control workstation that get information from the System Data Repository (SDR) will get the information for that particular system partition.
In managing multiple system partitions, it is helpful to open a window for each system partition. You can set and export the SP_NAME environment variable in each window and set up the window title bar or shell prompt with the system partition name. The following script is an example:
sysparenv:
# !/bin/ksh
for i in 'splst_syspars'
do
syspar='host $i | cut -f 1 -d"."'
echo "Opening the $syspar partition environment"
sleep 2
export SP_NAME=$syspar
aixterm -T "Work Environment for CWS 'hostname -s' - View: $syspar" -ls -sb &
done
exit
.profile addition:
# Added for syspar environment setup
if [ "'env | grep SP_NAME | cut -d= -f1'" = SP_NAME ]
then
PS1="['hostname -s'<p>"$SP_NAME] ['$PWD]> '
else
PS1="['hostname -s']["'$PWD]< '
fi
export ENV
As a user, you can check what system partition you're in with the command:
spget_syspar -n
The following table summarizes those commands which can exceed the boundary
of the current system partition. Unless otherwise stated, commands not listed
in this table have as their scope the current system partition.
| Command | Effect |
|---|---|
| arp | Can reference any node (by its host name) in any system partition. |
| Automounter commands | Host names need not be in the current system partition. |
| crunacct | Merges accounting data from all nodes regardless of system partition boundaries. |
| cshutdown -G | The -G flag allows specification of target nodes outside of the current system partition. |
| cstartup -G | The -G flag allows specification of target nodes outside of the current system partition. |
dsh
dsh -w{hostname | -}
| Hosts added to the working collective by host name need not be in the current system partition. |
| dsh -aG | The -G flag modifies the -a flag (all nodes in the current system partition) by expanding the scope to all nodes in the entire physical SP system. |
| Eclock | There is a single switch clock for the SP regardless of the number of system partitions. |
| Efence -G | The -G flag allows specification of nodes outside of the current system partition. |
| emonctrl -c | The system partition-sensitive control script for the emon subsystem supports the -c option, which crosses system partitions. |
| Eunfence -G | The -G flag allows specification of nodes outside of the current system partition. |
haemctrl -c haemctrl -u | The system partition-sensitive control script for the haem subsystem supports the -c and -u options, which cross system partitions. |
hagsctrl -c hagsctrl -u | The system partition-sensitive control script for the hags subsystem supports the -c and -u options, which cross system partitions. |
hatsctrl -c hatsctrl -u | The system partition-sensitive control script for the hats subsystem supports the -c and -u options, which cross system partitions. |
| hbctrl -c | The system partition-sensitive control script for the hb subsystem supports the -c option, which crosses system partitions. |
| hmcmds -G | The -G flag allows the hmcmds commands to be sent to any hardware on the SP system. |
| hmmon -G | The -G flag allows for the specification of hardware outside of the current system partition. |
hostlist hostlist -f filename hostlist -w hostname | Host names need not be in the current system partition. |
| hostlist -aG | -nG | -sG | The -G flag modifies the -a, -n, or -s flag by expanding the scope to the entire physical SP system. |
| hrctrl -c | The system partition-sensitive control script for the hr subsystem supports the -c option, which crosses system partitions. |
| hsdatalst -G | The -G flag causes the display of HSD information to be for all system partitions. |
| lppdiff -aG | The -G flag modifies the -a flag (all nodes in the current system partition) by expanding the scope to all nodes in the entire physical SP system. |
| nodecond -G | The -G flag allows specification of a node outside of the current system partition. |
| psyslrpt -w hostnames | The host names supplied with the -w flag can be in any system partition (the -a flag will select all nodes in the current system partition). |
| psyslclr -w hostnames | The host names supplied with the -w flag can be in any system partition (the -a flag will select all nodes in the current system partition). |
| penotify -w hostnames | The host names supplied with the -w flag can be in any system partition (the -a flag will select all nodes in the current system partition). |
| pmanctrl -c | The system partition-sensitive control script for the pman subsystem supports the -c option, which crosses system partitions. |
Parallel commands:
| Parallel commands can take the following options and will behave
accordingly:
|
SDRArchive, SDRRestore | Archives/restores the SDR representing the entire SP. |
| SDRGetObjects -G | The -G flag allows for retrieval of partitioned class objects from partitions other than the current system partition. Without the -G, objects which are in a partitioned class are retrieved from the current system partition only. |
| SDRMoveObjects | Moves objects from one system partition to another. |
| Other SDR commands | SDR commands that create, change or delete values work within the system partition. Note though that System classes (Frame, for example) are shared among all system partitions. Changes to system classes will affect other system partitions. |
Security commands:
| The function of these security commands is unchanged under system partitioning. That is, if they previously affected the entire SP, they continue to do so even if the system has been partitioned. If they previously had the ability to affect a remote node (rsh, rcp, for example), that function is unchanged in a system partitioned environment. |
| spapply_config | Applies a system partition configuration to the entire SP. |
| spbootins | If a boot server outside of the current system partition is specified, that node is prepared appropriately. |
| sp_configdctrl -c | The system partition-sensitive control script for the sp_configd subsystem supports the -c option, which crosses system partitions. |
| spframe | Configures data for one or more frames across the entire SP. |
| splm | The target nodes defined in the input table can include nodes from any system partition. |
| splst_versions -G | The -G flag allows for retrieval of PSSP version information from nodes outside the current system partition. |
| splstdata -G | The -G flag allows display of information on nodes and adapters outside of the current system partition. |
| splstadapters -G | The -G flag lists information about target adapters outside of the current system partition. |
| splstnodes -G | The -G flag lists information about target nodes outside of the current system partition. |
| spmon -G | The -G flag allows specification of nodes outside of the current system partition. The -G flag is required when performing operations on any frame or switch. |
| sprestore_config | Restores the entire SP SDR from a previously made archive. |
| spsitenv | Site environment variables are specified for the SP system as a whole. The specification of acct_master= can be any node in the SP regardless of system partition. The specification of install_image= may cause boot server nodes outside of the current system partition to refresh the default installation image they will serve to their nodes. |
| spverify_config | Verifies the configuration of all system partitions in the SPsystem. |
| supper | File collections are implemented and managed without respect to system partition boundaries. |
| sysctl | The Sysctl client can send requests to any node in the SP. |
| syspar_ctrl -c -G | The -c and -G flags allow for the crossing of system partitions in providing a single interface to the control scripts for the system partition-sensitive subsystems. |
| s1term -G | The -G flag allows specification of a node outside of the current system partition. |
| vsdatalst -G | The -G flag causes the display of IBM Virtual Shared Disk information to be for all system partitions. |
| vsdsklst -G | The -G flag specifies the display of information for disks outside the current system partition. |
Purpose
add_principal - Creates principals in the authentication database.
Syntax
add_principal [-r realm_name] [-v] file_name
Flags
Operands
Description
This command provides an interface to the authentication database to add an entry for a user or service instance, supplying the password used to generate the encrypted private key. The add_principal command is suitable for mass addition of users or multiple instances of servers (for example, SP nodes).
This command operates noninteractively if you have a valid ticket-granting-ticket (TGT) for your admin instance in the applicable realm. A TGT can be obtained using the kinit command. If you do not have a TGT for the admin instance for the realm in which you are adding principals, or if the add_principal command cannot obtain a service ticket for changing passwords using the admin TGT, the user is prompted for the password for the user's admin instance.
Administrators use the add_principal command to register new users and services instances to the authentication database. An administrator must have a principal ID with an instance of admin. Also, user_name.admin must appear in the admin_acl.add Access Control List (ACL).
The add_principal program communicates over the network with the kadmind program, which runs on the machine housing the primary authentication database. The kadmind program creates new entries in the database using data provided by this command.
When using the add_principal command, the principal's expiration date and maximum ticket lifetime are set to the default values. To override the defaults, the root user must use the kdb_edit command to modify those attributes.
Input to the program is read from the file specified by the file_name argument. It contains one line of information for each principal to be added, in the following format:
name[.instance][@realm] password
| Note: | The @realm cannot be different from the local realm or the realm argument if the -r option is specified. |
For user entries with a NULL instance, this format matches that of the log file created by the spmkuser command. Any form of white space can surround the two fields. Blank lines are ignored. Any line containing a # as the first nonwhite space character, is treated as a comment.
Since the input file contains principal identifiers and their passwords, ensure that access to the file is controlled. You should remove the input file containing the unencrypted passwords after using it, or delete the passwords from it.
The add_principal command does not add principals to an AFS authentication database. If authentication services are provided through AFS, use the AFS kas command to add principals to the database. Refer to the chapter on security in IBM Parallel System Support Programs for AIX: Administration Guide for an overview.
Files
Exit Values
Related Information
Commands: kadmin, kinit, kpasswd, ksrvutil
Refer to Chapter 2, "RS/6000 SP Files and Other Technical Information" section of IBM Parallel System Support Programs for AIX: Command and Technical Reference for additional Kerberos information.
Purpose
allnimres - Allocates Network Installation Management (NIM) resources from a NIM master to a NIM client.
Syntax
allnimres -h | -l node_list
Flags
Operands
None.
Description
Use this command to allocate all necessary NIM resources to a client based on the client's bootp_response in the System Data Repository (SDR). This includes executing the bos_inst command for allocation of the boot resource and nimscript resource. At the end of this command, nodes are ready to netboot to run installation, diagnostics, or maintenance. If the node's bootp_response is "disk", all NIM resources are deallocated from the node.
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must have root privilege to run this command.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Location
/usr/lpp/ssp/bin/allnimres
Related Information
Commands: setup_server, unallnimres
Examples
To allocate boot/installation resources to boot/install client nodes 1, 3, and 5 from their respective boot/install servers, enter:
allnimres -l 1,3,5
Purpose
/usr/lpp/ssp/css/arp - Displays and modifies address resolution.
Syntax
Parameters
type host_name adapter_address [route] [temp] [pub]
type host_name adapter_address [route] [temp] [pub]
where:
Description
The arp command has been modified to add support for the switch. This command is valid only on an SP system.
The arp command displays and modifies the Internet-to-adapter address translation tables used by ARP. The arp command displays the current ARP entry for the host specified by the host_name variable. The host can be specified by name or number, using Internet dotted decimal notation.
Related Information
SP Command: ifconfig
AIX Commands: crash, netstat
AIX Daemon: inetd
Refer to IBM Parallel System Support Programs for AIX: Administration Guide for additional information on the SP Switch and the High Performance Switch.
Refer to "TCP/IP Protocols" in AIX Version 4.1 System Management Guide: Communications and Networks.
Examples
arp -s switch host2 1
arp -d host1
Purpose
cfghsd - Configures a data striping device (HSD) for an IBM Virtual Shared Disk.
Syntax
cfghsd -a hsd_name ...
Flags
Operands
Description
This command configures the already defined HSDs and makes them available. The command extracts information from the System Data Repository (SDR).
Files
Security
You must have root privilege to run this command.
Restrictions
If you have the IBM Recoverable Virtual Shared Disk product installed and operational, do not use this command. The results may be unpredictable.
See IBM Parallel System Support Programs for AIX: Managing Shared Disks.
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: defhsd, hsdatalst, lshsd, ucfghsd
Examples
To make the HSD hsd1 available, enter:
cfghsd hsd1
Purpose
cfghsdvsd - Configures a data striping device (HSD) and the IBM Virtual Shared Disks that comprise it on one node.
Syntax
cfghsdvsd -a | {hsd_name...}
Flags
Operands
Description
Use this command to configure already-defined HSDs and their underlying IBM Virtual Shared Disks and make them available. Note all of the IBM Virtual Shared Disks go to the active state.
You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:
smit hsd_mgmtand select the Configure an HSD and its underlying IBM Virtual Shared Disks option.
Files
Security
You must have sysctl and sysctl.vsd access and authorization from your system administrator to run this command.
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: cfghsd, cfgvsd, ucfghsdvsd
Examples
To configure the data striping device hsd1 and the IBM Virtual Shared Disks that comprise it, enter:
cfghsdvsd hsd1
Purpose
cfgvsd - Configures an IBM Virtual Shared Disk.
Syntax
cfgvsd -a | vsd_name ...
Flags
Operands
Description
Use this command to configure the already defined IBM Virtual Shared Disks and bring them to the stopped state. It does not make the IBM Virtual Shared Disk available. The command extracts information from the System Data Repository (SDR).
You can use the System Management Interface Tool (SMIT) to run the cfgvsd command. To use SMIT, enter:
smit vsd_mgmtand select the Configure an IBM Virtual Shared Disk option.
Files
Security
You must have root privilege to run this command.
Restrictions
If you have the IBM Recoverable Virtual Shared Disk product installed and operational, do not use this command. The results may be unpredictable.
See IBM Parallel System Support Programs for AIX: Managing Shared Disks.
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: ctlvsd, lsvsd, preparevsd, resumevsd, startvsd, stopvsd, suspendvsd, ucfgvsd
Examples
To bring the IBM Virtual Shared Disk vsd1vg1n1 from the defined state to the stopped state, enter:
cfgvsd vsd1vg1n1
Purpose
chgcss - Applies configuration changes to a Scalable POWERparallel Switch (SP Switch) Communications Adapter (Type 6-9) or a High Performance Switch Communications Adapter Type 2.
| Implementation Note |
|---|
|
Configuration changes are later applied to the device when it is configured at system reboot. |
Syntax
chgcss -l name -a attr=value [-a attr=value]
Flags
where:
Operands
None.
Description
Use this command to apply configuration changes to an SP Switch Communications Adapter (Type 6-9) or a High Performance Switch Communications Adapter Type 2.
Security
You must have root privilege to run this command.
Prerequisite Information
For additional information on values for the rpoolsize and spoolsize attributes, refer to the "Tuning the SP System" chapter in IBM Parallel System Support Programs for AIX: Administration Guide.
Related Information
AIX Command: lsattr
Examples
chgcss -l css0 -a rpoolsize=0x100000
chgcss -l css0 -a rpoolsize=1048576 -a spoolsize=1048576
Purpose
chkp - Changes Kerberos principals.
Syntax
chkp -h
chkp [-e expiration] [-l lifetime] name[.instance] ...
Flags
lifetime operand - Approximate duration
141 1 day
151 2 days
170 1 week
180 2 weeks
191 1 month
At least one flag must be specified.
Operands
Description
Use this command to change principals in the local Kerberos database. It allows the current expiration date and maximum ticket lifetime to be redefined. It cannot be used to change the principal's password. To do that, the administrator must use the kpasswd, kadmin, or kdb_edit commands. The chkp command should normally be run only on the primary server. If there are secondary authentication servers, the push-kprop command is invoked to propagate the change to the other servers. The command can be used to update a secondary server's database, but the changes may be negated by a subsequent update from the primary.
Files
Exit Values
Security
The chkp command can be run by the root user logged in on a Kerberos server host. It can be invoked indirectly as a Sysctl procedure by a Kerberos database administrator who has a valid ticket and is listed in the admin_acl.mod file.
Location
/usr/kerberos/etc/chkp
Related Information
Commands: kadmin, kdb_edit, lskp, mkkp, rmkp, sysctl
Examples
chkp -l 171 default
chkp -l 181 -e 2003-06-30 franklin jtjones root.admin susan
Purpose
cksumvsd - Views and manipulates the IBM Virtual Shared Disk checksum parameters.
Syntax
cksumvsd [-s] [-R] [-i | -I]
Flags
If no flags are specified, the current setting of all IBM Virtual Shared Disk checksum parameters and counters are displayed.
Operands
None.
Description
The IBM Virtual Shared Disk IP device driver can calculate and send checksums on remote packets it sends. It also can calculate and verify checksums on remote packets it receives. The cksumvsd command is used to tell the device driver whether to perform checksum processing. The default is no checksumming.
Issuing cksumvsd -i turns on checksumming on the node on which it is run. cksumvsd -i must be issued on all IBM Virtual Shared Disk nodes in the system partition, or the IBM Virtual Shared Disk software will stop working properly on the system partition. If node A has cksumvsd -i (checksumming turned on) and node B has cksumvsd -I (checksumming turned off, the default), then A will reject all messages from B (both requests and replies), since A's checksum verification will fail on all B's messages. The safe way to run cksumvsd -i is to make sure that all IBM Virtual Shared Disks on all nodes are in the STOPPED or SUSPENDED states, issue cksumvsd -i on all nodes, then resume the needed IBM Virtual Shared Disks on all nodes.
In checksumming mode, the IBM Virtual Shared Disk IP device driver keeps a counter of the number of packets received with good checksums, and the number received with bad checksums. cksumvsd and statvsd both display these values (statvsd calls cksumvsd -s).
cksumvsd dynamically responds to the configuration of the IBM Virtual Shared Disk IP device driver loaded in the kernel. Its output and function may change if the IBM Virtual Shared Disk IP device driver configuration changes.
Files
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Command: cfgvsd
Examples
cksumvsd
You should receive output similar to the following:
VSD cksum: current values: do_ip_checksum: 0 ipcksum_cntr: 350 good, 0 bad, 0 % bad.
The IBM Virtual Shared Disk checksumming is currently turned off on the node. Prior to this, checksumming was turned on and 350 IBM Virtual Shared Disk remote messages were received, all with good checksumming.
cksumvsd -i
You should receive output similar to the following:
VSD cksum: current values: do_ip_checksum: 0 ipcksum_cntr: 350 good, 0 bad, 0 % bad. VSD cksum: new values: do_ip_checksum: 1 ipcksum_cntr: 350 good, 0 bad, 0 % bad.
The command displays old and new values. As before, the node has received 350 IBM Virtual Shared Disk remote messages with good checksums.
cksumvsd -s
You should receive output similar to the following:
ipcksum_cntr: 350 good, 0 bad, 0 % bad.
Purpose
cmonacct - Performs monthly or periodic SP accounting.
Syntax
cmonacct [number]
Flags
None.
Operands
Description
The cmonacct command performs monthly or periodic SP system accounting. The intervals are set in the crontab file. You can set the cron daemon to run the cmonacct command once each month or at some other specified time period. By default, if accounting is enabled for at least one node, cmonacct executes on the first day of every month.
The cmonacct command creates summary files under the /var/adm/cacct/fiscal directory and restarts summary files under the /var/adm/cacct/sum directory, the cumulative summary to which daily reports are appended.
Purpose
cprdaily - Creates an ASCII report of the previous day's accounting data.
Syntax
cprdaily [-c] [[-l] [yyyymmdd]]
Flags
Operands
Description
This command is called by the crunacct command to format an ASCII report of the previous day's accounting data for all nodes. The report resides in the /var/adm/cacct/sum/rprtyyyymmdd file, where yyyymmdd specifies the year, month, and day of the report.
Purpose
cptuning - Copies a file to /tftpboot/tuning.cust.
Syntax
cptuning -h | file_name
Flags
Operands
Description
Use this command to copy the specified file to the /tftpboot/tuning.cust file. IBM ships the following four predefined tuning parameter files in /usr/lpp/ssp/install/config:
This command is intended for use in copying one of these files to /tftpboot/tuning.cust on the control workstation for propagation to the nodes in the SP. It can also be used on an individual node to copy one of these files to /tftpboot/ tuning.cust.
Standard Output
When the command completes successfully, a message to that effect is written to standard output.
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Output Files
Upon successful completion, the /tftpboot/tuning.cust file is updated.
Consequences of Errors
If the command does not run successfully, it terminates with an error message and a nonzero return code.
Security
Use of this command by other than the root user is not restricted. However, this command will fail if the user does not have read permission to the specified file and write permission to the /tftpboot directory.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Location
/usr/lpp/ssp/bin/cptuning
Related Information
SP Files: tuning.commercial, tuning.default, tuning.development, tuning.scientific
IBM Parallel System Support Programs for AIX: Installation and Migration Guide
Examples
cptuning /tmp/my-tuning-file
cptuning tuning.commercial
Purpose
create_krb_files - Creates the necessary krb_srvtab and tftp access files on the Network Installation Management (NIM) master.
Syntax
create_krb_files [-h]
Flags
Operands
None.
Description
Use this command on a boot/install server (including the control workstation). On the server, it creates the Kerberos krb_srvtab file for each boot/install client of that server and also updates the /etc/tftpaccess.ctl file on the server.
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must have root privilege to run this command.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Location
/usr/lpp/ssp/bin/create_krb_files
Related Information
Commands: setup_server
Examples
To create or update the krb_srvtab and tftp access files on a boot/install server, enter the following command on that server:
create_krb_files
Purpose
createhsd - Creates one data striping device (HSD) that encompasses two or more IBM Virtual Shared Disks.
Syntax
Flags
| Note: | Some examples shown in this list do not contain enough flags to be executable. They are shown in an incomplete form to illustrate specific flags. |
The sequence in which nodes are listed determines the names given to the IBM Virtual Shared Disks; for example:
createhsd -n 1,6,4 -d DATA(with the hsd_prefix DATA) creates IBM Virtual Shared Disks DATA1n1 on node 1, DATA2n6 on node 6, and DATA3n4 on node 4, which make up a single HSD named DATA. To create volume groups that span specified disks on nodes 1, 2, and 3 of a system with backup on nodes 4, 5, and 6 of the same system, and that make up a single HSD, enter:
createhsd -n 1/4:hdisk1,hdisk2,hdisk3/,2/5:hdisk5,hdisk6, \ hdisk7/,3/6:hdisk2,hdisk4,hdisk6/ -d DATA -s 12 -g OLD -t 4096This command is shown on two lines here, but you must enter it without any spaces between the items in node_list.
The command creates:
If a volume group is already created and the combined physical hdisk lists contain disks that are not needed for the logical volume, those hdisks are added to the volume group. If the volume group has not already been created, createhsd creates a volume group that spans hdisk_list1+hdisk_list2.
Backup nodes cannot use the same physical disk as the primary does to serve IBM Virtual Shared Disks.
createhsd -n 6 -g VSDVGcreates a new volume group with the local AIX volume group name VSDVG and the IBM Virtual Shared Disk global volume group name VSDVGn6. The node number is added to the local volume group name to create a unique global volume group name within a system partition to avoid name conflicts with the name used for volume groups on other nodes. If a backup node exists, the global volume group name will be created by concatenating the backup node number as well as the primary node number to the local volume group name. For example:
createhsd -n 6/3/ -g VSDVGcreates VSDVGn6b3, where the primary node is node 6 and the backup node for this global volume group is node 3. The local AIX volume group name will still be VSDVG. You can specify a local volume group that already exists. You do not need to use the -T flag if you specify a volume group name that already exists.
createhsd -n 1,6 -c 2 -d DATAcreates IBM Virtual Shared Disks DATA1n1 on node 1, DATA2n6 on node 6, DATA3n1 on node 1, and DATA4n6 on node 6 and uses them to make up the HSD DATA.
createhsd -n 1,6 -c 2 -g DATAcreates DATA1n1 and DATA2n1 on node 1, and DATA3n6 and DATA4n6 on node 6.
The command:
createhsd -n 1,2 -h DATAcreates two IBM Virtual Shared Disks, DATA1n1 and DATA2n2. These IBM Virtual Shared Disks make up one HSD named DATA.
createhsd -n 1 -v DATAcreates one IBM Virtual Shared Disk on node 1 named DATA1n1 with an underlying logical volume lvDATA1n1. If the command
createhsd -n 1 -v DATA -l newis used, the IBM Virtual Shared Disk on node 1 is still named DATA1n1, but the underlying logical volume is named lvnew1n1.
It is usually more helpful not to specify -l, so that your lists of IBM Virtual Shared Disk names and logical volume names are easy to associate with each other and you avoid naming conflicts.
The Logical Volume Manager limits the number of physical partitions to 1016 per disk. If a disk is greater than 4 gigabytes in size, the physical partition size must be greater than 4MB to keep the number of partitions under the limit.
is not done as part of the createvsd processing that underlies the createhsd command. This speeds the operation of the command and avoids unnecessary processing in the case where several IBM Virtual Shared Disks are being created on the same primary/secondary nodes. In that case, however, you need to explicitly issue the volume group commands listed previously.
Operands
None.
Description
This command utilizes the sysctl facility.
You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:
smit createhsd_dialogor
smit vsd_dataand Select the Create an HSD option with the vsd_data fastpath.
Files
Standard Output
For the following command:
createhsd -n 1/:hdisk2,hdisk3/ -g twinVG -s 1600 -t 8 -S -l \ twinLV -d twinHSD -c 4The messages returned to standard output are:
OK:0:vsdvg -g twinVGn1 twinVG 1
OK:0:defvsd twinLV1n1 twinVGn1 twinHSD1n1 nocache
OK:0:defvsd twinLV2n1 twinVGn1 twinHSD2n1 nocache
OK:0:defvsd twinLV3n1 twinVGn1 twinHSD3n1 nocache
OK:0:defvsd twinLV4n1 twinVGn1 twinHSD4n1 nocache
OK:createvsd { -n 1/:hdisk2,hdisk3/ -s 401 -T 4 -g twinVG
-c 4 -v twinHSD -l twinLV -o cache -K }
OK:0:defhsd twinHSD not_protect_lvcb 8192 twinHSD1n1 twinHSD2n1
twinHSD3n1 twinHSD4n1
Exit Values
Security
You must have sysctl and sysctl.vsd access and authorization from your system administrator to run this command.
Restrictions
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: createvsd, defhsd, vsdvg
Examples
To create six 4MB IBM Virtual Shared Disks and their underlying logical volumes with a prefix of TEMP, as well as an HSD comprising those IBM Virtual Shared Disks (24MB overall) with a stripe size of 32KB, enter the following (assuming that no previous IBM Virtual Shared Disks are defined with the TEMP prefix):
createhsd -n 3,4,7/8/ -c 2 -s 1024 -g vsdvg -d TEMP -t 32
This creates the following IBM Virtual Shared Disks:
and the HSD:
| Note: | TEMP does not write to the first 32KB of each of its IBM Virtual Shared Disks. |
Purpose
createvsd - Creates a set of IBM Virtual Shared Disks, with their associated logical volumes, and puts information about them into the System Data Repository (SDR).
Syntax
| Note: | Some examples shown in this list do not contain enough flags to be executable. They are shown in an incomplete form to illustrate specific flags. |
Flags
createvsd -n 1,6,4 -v PRE(with the vsd_prefix PRE) creates IBM Virtual Shared Disks PRE1n1 on node 1, PRE2n6 on node 6, and PRE3n4 on node 4.
To create a volume group that spans hdisk2, hdisk3, and hdisk4 on node 1, with a backup on node 3, enter:
createvsd -n 1/3:hdisk2,hdisk3,hdisk4/ -v DATAThis command creates:
To create volume groups just like that one on nodes 1, 2, and 3 of a system with backup on nodes 4, 5, and 6 of the same system, enter:
createvsd -n 1/4:hdisk1,hdisk2,hdisk3/,2/5:hdisk5,hdisk6, \ hdisk7/,3/6:hdisk2,hdisk4,hdisk6/ -v DATA
This command is shown on two lines here, but you must enter it without any spaces between the items in node_list.
The command creates:
To create an IBM Virtual Shared Disk where the logical volume spans only two of the physical disks in the volume group, enter:
createvsd -n 1/3:hdisk1,hdisk2+hdisk3/ -v DATAThis command creates the IBM Virtual Shared Disk DATA1n1 with logical volume lvDATA1n1 spanning hdisk1 and hdisk2 in the volume group DATA, which includes hdisk1, hdisk2, and hdisk3. It exports the volume group DATA to node 3.
If a volume group is already created and the combined physical hdisk lists contain disks that are not needed for the logical volume, those hdisks are added to the volume group. If the volume group has not already been created, createvsd creates a volume group that spans hdisk_list1+hdisk_list2.
Backup nodes cannot use the same physical disk as the primary does to serve IBM Virtual Shared Disks.
createvsd -n 6 -g VSDVGcreates a volume group with the local volume group name VSDVG and the global volume group name VSDVG1n6 on node 6. The node number is added to the prefix to avoid name conflicts when a backup node takes over a volume group. If a backup node exists, the global volume group name will be concatenated with the backup node number as well as the primary. For example:
createvsd -n 6/3/ -g VSDVGcreates a volume group with the local volume group name VSDVG and the global volume group name VSDVGn6b3. The primary node is node 6 and the backup node for this volume group is node 3.
createvsd -n 1,6 -c 2 -v DATAcreates IBM Virtual Shared Disks DATA1n1 on node 1, DATA2n6 on node 6, DATA3n1 on node 1, and DATA4n6 on node 6.
createvsd -n 1,6 -c 2 -h DATAcreates DATA1n1 and DATA2n1 on node 1, and DATA3n6 and DATA4n6 on node 6.
If -v is not specified, the prefix vsd is used.
| Note: | The last character of the vsd_name_prefix cannot be a digit. Otherwise, the 11th IBM Virtual Shared Disk with the prefix PRE would have the same name as the first IBM Virtual Shared Disk with the prefix PRE1. Nor can the vsd_name_prefix contain the character '.' because '.' can be any character in regular expressions. |
createvsd -n 1 -v DATAcreates one IBM Virtual Shared Disk on node 1 named DATA1n1 with an underlying logical volume lvDATA1n1. If the command
createvsd -n 1 -v DATA -l newis used, the IBM Virtual Shared Disk on node 1 is still named DATA1n1, but the underlying logical volume is named lvnew1n1.
It is usually more helpful not to specify -l, so that your lists of IBM Virtual Shared Disk names and logical volume names are easy to associate with each other and you avoid naming conflicts.
The Logical Volume Manager limits the number of physical partitions to 1016 per disk. If a disk is greater than 4 gigabytes in size, the physical partition size must be greater than 4MB to keep the number of partitions under the limit.
is not done as part of the createvsd processing. This speeds the operation of the command and avoids unnecessary processing in the case where several IBM Virtual Shared Disks are being created on the same primary/secondary nodes. In this case, however, you should either not specify -x on the last createvsd in the sequence or issue the volume group commands listed above explicitly.
Operands
None.
Description
Use this command to create a volume group with the specified name (if one does not already exist) and creates a logical volume of size s within that volume group.
You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:
smit vsd_dataand select the Create an IBM Virtual Shared Disk option.
Files
Standard Output
For the following command:
createvsd -n 1/:hdisk1/ -g testvg -s 16 -T 8 -l lvtest -v test -c 4The messages returned to standard output are:
OK:0:vsdvg -g testvgn1 testvg 1 OK:0:defvsd lvtest1n1 testvgn1 test1n1 nocache OK:0:defvsd lvtest2n1 testvgn1 test2n1 nocache OK:0:defvsd lvtest3n1 testvgn1 test3n1 nocache OK:0:defvsd lvtest4n1 testvgn1 test4n1 nocache
For the following command:
createvsd -n 1/:hdisk1/ -g testvg -s 16 -T 8 -l lvtest -v test -c 4The messages returned to standard output are:
OK:0:defvsd lvtest5n1 testvgn1 test5n1 nocache OK:0:defvsd lvtest6n1 testvgn1 test6n1 nocache OK:0:defvsd lvtest7n1 testvgn1 test7n1 nocache OK:0:defvsd lvtest8n1 testvgn1 test8n1 nocache
Exit Values
Security
You must have sysctl and sysctl.vsd access and authorization from your system administrator to run this command.
Restrictions
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: defvsd, vsdvg
Examples
To create two 4MB IBM Virtual Shared Disks on each of three primary nodes, one of which has a backup, enter:
createvsd -n 3,4,7/8/ -c 2 -s 4 -g vsdvg -v TEMPThis command creates the following IBM Virtual Shared Disks:
To create three IBM Virtual Shared Disks, where the logical volume created on node 3 spans fewer disks than the volume group does, enter:
createvsd -n 3,4/:hdisk1,hdisk2+hdisk3/,7/8/ -s 4 -g datavg -v USERThis command creates:
Purpose
crunacct - Runs on the acct_master node to produce daily summary accounting reports and to accumulate accounting data for the fiscal period using merged accounting data from each node.
Syntax
Flags
are copied to the acct_master node to the following files:
for all YYYYMMDD prior or equal to the YYYYMMDD being processed.
It also creates an SP system version of the loginlog file, in which each line consists of a date, a user login name and a list of node names. The date is the date of the last accounting cycle during which the user, indicated by the associated login name, had at least one connect session in the SP system. The associated list of node names indicates the nodes on which the user had a login session during that accounting cycle.
Operands
Description
In order for SP accounting to succeed each day, the nrunacct command must complete successfully on each node for which accounting is enabled and then the crunacct command must complete successfully on the acct_master node. However, this may not always be true. In particular, the following scenarios must be taken into account:
From the point of view of the crunacct command, the first scenario results in no accounting data being available from a node. The second scenario results in more than one day's accounting data being available from a node. If it is the case that no accounting data is available from a node, the policy of crunacct is that the error condition is reported and processing continues with data from the other nodes. If data cannot be obtained from at least X percent of nodes, then processing is terminated. "X" is referred to as the spacct_actnode_thresh attribute and can be set via a SMIT panel.
If node data for accounting cycle N is not available when crunacct executes and then becomes available to crunacct during accounting cycle N+1, the node data for both the N and N+1 accounting cycles is merged by crunacct. In general, crunacct merges all data from a node that has not yet been reported into the current accounting cycle, except as in the following case.
If it is the case that crunacct has not run for more than one accounting cycle, such that there are several day's data on each node, then the policy of crunacct is that it processes each accounting cycle's data to produce the normal output for each accounting cycle. For example, if crunacct has not executed for accounting cycles N and N+1, and it is now accounting cycle N+2, then crunacct first executes for accounting cycle N, then executes for accounting cycle N+1 and finally executes for accounting cycle N+2.
However, if the several accounting cycles span from the previous fiscal period to the current fiscal period, then only the accounting cycles that are part of the previous fiscal period are processed. The accounting cycles that are part of the current fiscal period are processed during the next night's execution of crunacct. Appropriate messages are provided in the /var/adm/cacct/active file so that the administrator can execute cmonacct prior to the next night's execution of crunacct.
Restart Procedure
To restart the crunacct command after a failure, first check the /var/adm/cacct/activeYYYYMMDD file for diagnostic messages, and take appropriate actions. For example, if the log indicates that data was unavailable from a majority of nodes, and their corresponding nrunacct state file indicate a state other than complete, check their /var/adm/acct/nite/activeYYYYMMDD files for diagnostic messages and then fix any damaged data files, such as pacct or wtmp.
Remove the lock files and lastdate file (all in the /var/adm/cacct directory), before restarting the crunacct command. You must specify the -r and YYYYMMDD parameters if you are restarting the crunacct command. It specifies the date for which the crunacct command is to rerun accounting. The crunacct procedure determines the entry point for processing by reading the /var/adm/cacct/statefileYYYYMMDD file. To override this default action, specify the desired state on the crunacct command line.
Files
Security
Access Control: This command should grant execute (x) access only to members of the adm group.
Prerequisite Information
For more information about the Accounting System, the preparation of daily and monthly reports, and the accounting files, see IBM Parallel System Support Programs for AIX: Administration Guide.
Related Information
Commands: acctcms, acctcom, acctcon1, acctcon2, acctmerg, acctprc1, acctprc2, accton, crontab, fwtmp, nrunacct
Daemon: cron
The System Accounting information found in AIX Version 4.1 System Management Guide
Examples
nohup /usr/lpp/ssp/bin/crunacct -r 19940601 2>> \ /var/adm/cacct/nite/accterr &This example restarts crunacct for the day of June 1 (0601), 1994. The crunacct command reads the file /var/adm/cacct/statefile19940601 to find out the state with which to begin. The crunacct command runs in the background (&), ignoring all INTERRUPT and QUIT signals (nohup). Standard error output (2) is added to the end (>>) of the /var/adm/cacct/nite/accterr file.
nohup /usr/lpp/ssp/bin/crunacct 19940601 CMS 2>> \ /var/adm/cacct/nite/accterr &This example restarts the crunacct command for the day of June 1 (0601), 1994, starting with the CMS state. The crunacct command runs in the background (&), ignoring all INTERRUPT and QUIT signals (the nohup command). Standard error output (2) is added to the end (>>) of the /var/adm/cacct/nite/accterr file.
Purpose
cshutdown - Specifies the SP system Shutdown command.
Syntax
Flags
If you specify -E, you cannot specify -X.
Notes:
If there are special subsystems, the same waiting procedure applies to subsystem sequencing in the subsystem phase.
| Note: | If some critical nodes, but not the entire system, are forced to halt or reboot, the system may not function correctly. |
Operands
Description
Use this command to halt or reboot the entire system or any number of nodes in the system. The SP cshutdown command is analogous to the workstation shutdown command. Refer to the shutdown man page for a description of the shutdown command. The cshutdown command always powers off the nodes except while in Maintenance mode.
| Note: |
If you bring a node down to maintenance mode, you must ensure file system
integrity before rebooting the node.
In this case, the cshutdown command, which runs from the control workstation, cannot rsh to the node to perform the node shutdown phase processing. This includes the synchronization of the file systems. Therefore, you should issue the sync command three times in succession from the node console before running the cshutdown command. This is especially important if any files were created while the node was in maintenance mode. To determine which nodes may be affected, issue the spmon -d command and look for a combination of power on and host_responds no. |
The cshutdown command has these advantages over using the shutdown command to shut down each node of an SP:
Using one cshutdown command on the control workstation, you can shut down all or selected nodes.
You can use the /etc/cshutSeq file to control the order in which nodes are shut down, or you can let the system determine the order based on System Data Repository information about /usr servers and clients.
The /etc/subsysSeq file lists these special subsystems and describes any sequencing relationships between them.
Shutdown processing has these phases:
Files
The following files reside on the control workstation:
Restrictions
The cshutdown command can only be issued on the control workstation by root or members of the shutdown group. The root user must issue the kinit command, specifying a principal name for which there is an entry in the hardmon ACLs file with control authorization for the frames to shut down. The hardmon and System Data Repository (SDR) must be running.
Results
The cshutdown command may be gated by the failure of some subsystems or nodes to complete shutdown. In this case, look in the file created: /var/adm/SPlogs/cs/cshut.MMDDhhmmss.pid
If a file with the same name already exists (from a previous year), the cshutdown command overwrites the existing file.
Related Information
Commands: cstartup, init, seqfile, shutdown
Examples
Group1 > Group2 > Group3 Group1: A Group2: B Group3: C
This defines 3 groups, Group1 through Group3, each containing a single node. The nodes names are A, B, and C. The sequence line Group1 > Group2 > Group3 means that Group3 (node C) is shut down first. When Group3 is down, Group2 (node B) is shut down. When Group2 is down, then Group1 (node A) is shut down.
Table 1 shows that the result of a cshutdown command depends on the flags
specified on the command line, the initial state of each node, and the
sequencing rules in /etc/cshutSeq. The shorthand notation
Aup indicates that node A is up and running;
Adnindicates that node A is down.
Table 1. Examples of the cshutdown Command
| The subscript the subscript dn means the node is not running. | |||
| Initial State | Command Issued | Final State | Explanation |
|---|---|---|---|
| Aup Bup Cup | cshutdown A B C | Adn Bdn Cdn | The command succeeds; the nodes are all down. |
| Aup Bup Cdn | cshutdown B | Aup Bdn Cdn | The command succeeds because C is already not running. |
| Aup Bup Cdn | cshutdown A | Unchanged | The command fails because B is still running. |
| Aup Bup Cdn | cshutdown -X A | Adn Bup Cdn | The command succeeds because -X considers the sequencing of only the target nodes. |
cshutdown -GXY ALL
cshutdown -G -N 1 9 16-20
The command may fail if any node in the list depends on any node that is not on the list and that node is not shutdown.
cshutdown ALL
The command may fail if any node in the current system partition depends on nodes outside of the current system partition.
cshutdown -N 1 5 6
The command may fail if any node in the list is not in the current system partition or depends on nodes outside of the current system partition.
cshutdown -X -N 1 5 6
cshutdown -F -N 5
cshutdown -kF ALL
cshutdown -rN -C'-W 300' 12-16
cshutdown -rg sleepy_nodes
Purpose
CSS_test - Verifies that the installation and configuration of the Communications Subsystem of the SP system completed successfully.
Syntax
CSS_test
Flags
None.
Operands
None.
Description
Use this command to verify that the Communications Subsystem component ssp.css of the SP system was correctly installed. CSS_test runs on the system partition set in SP_NAME.
A return code of 0 indicates that the test completed without a failure, but unexpected results may be noted on standard output and in the companion log file /var/adm/SPlogs/CSS_test.log. A return code of 1 indicates that a failure occurred.
You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:
smit SP_verify
Files
Related Information
Commands: jm_install_verify, jm_verify, SDR_test, SYSMAN_test, spmon_ctest, spmon_itest
Examples
To verify the Communication Subsystem following installation, enter:
CSS_test
Purpose
cstartup - Specifies the SP system Startup command.
Syntax
Flags
| Note: | Your system may still be usable if one or more nodes fails to complete startup, because the sequencing rules are preserved. |
| Note: | If some nodes but not the entire system are forced to start up this way, they may not function properly because of possible resource problems. |
Operands
Description
| Caution! |
|---|
|
The cstartup command attempts to power on nodes that are powered off. This has safety implications if someone is working on the nodes. Proper precautions should be taken when using this command. |
The cstartup command starts up the entire system or any number of nodes in the system. If a node is not powered on, startup means powering on the node. If the node is already powered on and not running, startup means resetting the node.
The /etc/cstartSeq file specifies the sequence in which the nodes are started up. See IBM Parallel System Support Programs for AIX: Administration Guide for the format of the /etc/cstartSeq file.
You can use the -SXZ flags to violate the cstartup sequence intentionally. See Table 2 for examples of the effect of these flags.
Files
The following files reside on the control workstation:
Restrictions
The cstartup command can only be issued on the control workstation by root or members of the shutdown group. The root user must issue the kinit command, specifying a principal name for which there is an entry in the hardmon ACLs file with control authorization for the frames to start up. The hardmon and System Data Repository (SDR) must be running.
Results
The /var/adm/SPlogs/cs/cstart.MMDDhhmmss.pid file contains the results of cstartup.
If the command fails, examine this file to see which steps were completed. If a file with the same name already exists (from a previous year), the cstartup command overwrites the existing file.
Related Information
Commands: cshutdown, init, seqfile
Examples
Group1 > Group2 > Group3 > Group4 > Group5
Group1: A
Group2: B
Group3: C
Group4: D
Group5: E
This defines five groups, Group1 through Group5, each containing a single node. The nodes names are A, B, C, D, and E. The sequence line Group1 > Group2 > Group3 > Group4 > Group5 means that Group1 (node A) is started first. When Group1 is up, Group2 (node B) is started. When Group2 is up, then Group3 (node C) is started, and so on.
Table 2 shows that the result of a cstartup command depends on the flags
specified on the command line, the initial state of each node, and the
sequencing rules in /etc/cstartSeq. The shorthand notation
Aup indicates that A is powered up and running;
Adnindicates that A is not running.
Table 2. Examples of the cstartup Command
| The subscript dn means the node is down. | |||
| Initial State | Command Issued | Final State | Explanation |
|---|---|---|---|
| Adn Bdn Cdn Ddn Edn | cstartup A B C D E | Aup Bup Cup Dup Eup | The command succeeds; the nodes are all up. |
| Aup Bup Cdn Ddn Edn | cstartup A B C D E | Aup Bup Cup Dup Eup | The command succeeds, C, D, and E are started up. |
| Aup Bup Cdn Dup Edn | cstartup A B C D E | Unchanged | The command fails because D was already up before C. |
| Aup Bup Cdn Dup Edn | cstartup -S A B C D E | Aup Bup Cup Dup Eup | The command succeeds because -S ignores sequencing violations. |
| Aup Bup Cdn Dup Edn | cstartup -Z A B C D E | Aup Bup Cup Dup Eup | The command succeeds because -Z resets running nodes. |
| Aup Bup Cdn Dup Edn | cstartup C E | Unchanged | The command fails because node D was already up before node C. |
| Aup Bup Cdn Dup Edn | cstartup -S C E | Aup Bup Cup Dup Eup | The command succeeds because -S ignores sequencing violations. |
| Aup Bup Cdn Dup Edn | cstartup -X C E | Aup Bup Cup Dup Eup | The command succeeds because -X considers the sequencing of only the target nodes. |
| Aup Bup Cdn Dup Edn | cstartup -Z C E | unchanged | The command fails because resetting C or E does not correct the sequence violation. |
| Aup Bup Cdn Ddn Edn | cstartup C E | unchanged | The command fails because D is gating E. Node C is not started either. |
| Aup Bup Cdn Ddn Edn | cstartup -S C E | unchanged | The command fails because D is gating E. Node C is not started either. |
| Aup Bup Cdn Ddn Edn | cstartup -X C E | Aup Bup Cup Ddn Eup | The command succeeds and starts up only the explicit targets, C and E. |
| Aup Bup Cdn Ddn Edn | cstartup -Z C E | unchanged | The command fails because D is gating E. Node C is not started either. |
cstartup -GXZ ALL
cstartup -G -N 1 9 16-20
The command may fail if any node in the list depends on any node that is not on the list and that node is not started up.
cstartup ALL
The command may fail if any node in the current system partition depends on nodes outside of the current system partition.
cstartup -N 1 5 6
The command may fail if any node in the list is not in the current system partition or depends on nodes outside of the current system partition.
cstartup -X -N 1 5 6
cstartup -k ALL
cstartup -E ALL
cstartup -Gg sleepy_nodes
Purpose
ctlhsd - Sets the data striping device (HSD) for the IBM Virtual Shared Disks operational parameters or reset statistics.
Syntax
ctlhsd [-p parallel_level | -v hsd_name ... | -C | -V]
Flags
Operands
None.
Description
Use this command to set the parallelism level and to reset the statistics of the data striping device (HSD) for the IBM Virtual Shared Disk. When specified with no arguments, it displays the the current parallelism level, the number of reworked requests, and the number of requests that were not at a page boundary. When ctlhsd is used to reset the statistics of the device driver, or a particular device, or all the configured data striping devices on the system, it will not suspend all the underlying IBM Virtual Shared Disks. In other words, the user should make sure that there are no I/O activities on the IBM Virtual Shared Disks.
Use lshsd -s to display the statistics on the number of read and write requests at the underlying IBM Virtual Shared Disks in an HSD or all HSDs. Use the -v or -V flag to reset these counters.
Files
Security
You must have root privilege to run this command.
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: cfghsd, lshsd, lsvsd, resumevsd, suspendvsd, ucfghsd
Examples
To display the current parallelism level and counter, enter:
ctlhsdThe system displays a message similar to the following:
The current parallelism level is 9. The number of READ requests not at page boundary is 0. The number of WRITE requests not at page boundary is 0.
Purpose
ctlvsd - Sets IBM Virtual Shared Disk operational parameters.
Syntax
Flags
| Note: | Before using this flag, refer to the "Restrictions" section that follows. |
| Note: | Before using this flag, refer to the "Restrictions" section that follows. |
This value is the buf_cnt parameter on the uphysio call the IBM Virtual Shared Disk IP device driver makes in the kernel. Use statvsd to display the current value on the node on which the command is run.
Operands
None.
Description
The ctlvsd command changes some parameters of the IBM Virtual Shared Disk. When called with no arguments it displays the current and maximum cache buffer count, the request block count, the pbuf count, the minimum buddy buffer size, the maximum buddy buffer size as well as the overall size of the buddy buffer.
Use statvsd to display outgoing and expected sequence numbers and out cast status of other nodes as viewed by the node on which the command is run. It is best to suspendvsd -a on all nodes whose sequence numbers are being reset prior to actually resetting the sequence numbers. Be sure to use resumevsd on all IBM Virtual Shared Disks that were suspended after resetting the sequence numbers.
Initially, all sequence numbers are set to 0 when the first IBM Virtual Shared Disk is configured and the IBM Virtual Shared Disk device driver is loaded. Thereafter, sequence numbers are incremented as requests are sent to (outgoing) and received from (expected) other nodes, and reset via ctlvsd -R | -r commands.
Reloading the IBM Virtual Shared Disk device driver by suspendvsd -a, stopvsd -a, or ucfgvsd -a followed by cfgvsd also resets all sequence numbers to 0.
Initially, all nodes in the IBM Virtual Shared Disk network are cast in. The ctlvsd -k command casts a node out. The local node ignores requests from cast out nodes. The ctlvsd -r command casts nodes back in.
Files
Security
You must have root privilege to run this command.
Restrictions
If you have the IBM Recoverable Virtual Shared Disk product installed and operational, do not use the -k and -K options. The results may be unpredictable.
See IBM Parallel System Support Programs for AIX: Managing Shared Disks.
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: cfgvsd, lsvsd, preparevsd, resumevsd, startvsd, statvsd, stopvsd, suspendvsd, ucfgvsd
Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for information on tuning IBM Virtual Shared Disk performance and sequence numbers.
Examples
To display the current parameters, enter:
ctlvsdThe system displays a message similar to the following:
The current cache buffer count is 64. The maximum cache buffer count is 256. The request block count is 256. The pbuf's count is 48. The minimum buddy buffer size is 4096. The maximum buddy buffer size is 65536. The total buddy buffer size is 4 max buffers, 262144 bytes.
To display the mbuf headers and current routing table, enter:
ctlvsd -tThe system displays the following information:
Mbuf Cache Stats:
Header
Cached 1
Hit 1023
Miss 1
Route cache information:
destination interface ref status direct/gateway min managed mbuf
1 css0 2 Up Direct 256
Purpose
defhsd - Defines a data striping device (HSD).
Syntax
defhsd protect_LVCB | not_protect_LVCB hsd_name stripe_size vsd_name...
Flags
None.
Operands
Description
The defhsd command is used to specify the hsd_name, stripe size and underlying IBM Virtual Shared Disks for the new data striping device (HSD).
You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:
smit vsd_dataand select the Define a Hashed Shared Disk option.
Files
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: hsdatalst, undefhsd
Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for information on tuning IBM Virtual Shared Disk performance and sequence numbers.
Examples
The following example adds SDR information indicating a stripe size of 32768, composed of vsd.vsdn101, vsd.vsdn201, and the name hsd1 is defined.
defhsd hsd1 32768 vsd.vsdn101 vsd.vsdn201
Purpose
defvsd - Defines an IBM Virtual Shared Disk.
Syntax
defvsd logical_volume_name global_group_name vsd_name [nocache | cache]
Flags
None.
Operands
| Note: | If you choose a vsd_name that is already the name of another device, the cfgvsd command will fail for that IBM Virtual Shared Disk. This failure ensures that the special device files created for the name do not overlay and destroy files of the same name representing some other device type (such as a logical volume). |
The cache option should only be used if the using application gains performance by avoiding a 4KB read immediately after a 4KB write. Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for additional information on IBM Virtual Shared Disk tuning.
Description
This command is run to specify logical volumes residing on globally accessible volume groups to be used as IBM Virtual Shared Disks.
You can use the System Management Interface Tool (SMIT) to run the defvsd command. To use SMIT, enter:
smit vsd_dataand select the Define a Virtual Shared Disk option.
Security
You must have root privilege to run this command.
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: vsdatalst, vsdvg, undefvsd
Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for information regarding IBM Virtual Shared Disk performance enhancements.
Examples
defvsd lv1vg1n1 vg1n1 vsd1vg1n1
defvsd lv2vg1n1 vg1n1 vsd1vg2n1 cache
Purpose
delnimclient - Deletes a Network Installation Management (NIM) client definition from a NIM master.
Syntax
delnimclient -h | -l node_list | -s server_node_list
Flags
Operands
None.
Description
Use this command to undefine a node as a NIM client. This is accomplished by determining the node's boot/install server and unconfiguring that client node as a NIM client on that server. When complete, the entry for the specified client is deleted from the NIM configuration database on the server. This command does not change the boot/install attributes for the nodes in the System Data Repository.
| Note: | This command results in no processing on the client node. |
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must have root privilege to run this command.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Location
/usr/lpp/ssp/bin/delnimclient
Related Information
Commands: mknimclient, setup_server
Examples
To delete the NIM client definition for nodes 1, 3, and 5 from the NIM database on their respective boot/install servers, enter:
delnimclient -l 1,3,5
Purpose
delnimmast - Unconfigures a node as a Network Installation Management (NIM) master.
Syntax
delnimmast -h | -l node_list
Flags
Operands
None.
Description
Use this command to undefine a node as a NIM master. This command does not change the boot/install attributes for the nodes in the System Data Repository.
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must have root privilege to run this command.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Location
/usr/lpp/ssp/bin/delnimmast
Related Information
Commands: mknimmast, setup_server
Examples
To unconfigure nodes 1, 3, and 5 as NIM masters and delete the NIM file sets, enter:
delnimmast -l 1,3,5
Purpose
dsh - Issues commands to a group of hosts in parallel.
Syntax
Flags
If the -a, -w, or -N flags are not specified, the WCOLL environment variable contains the name of a file containing host names for the working collective.
Operands
Description
The dsh executes commands against all or any subset of the hosts in a network. It reads lines from the command line or standard input and executes each as a command on a set of network-connected hosts. These commands are in rsh syntax. Alternatively, a single command in rsh syntax can be specified on the dsh command line.
As each command is read, it is interpreted by passing it to each host in a group called the working collective via the SP rsh command.
The working collective is obtained from the first existence of one of the following:
If neither of these exist, an error has occurred and no commands are issued.
The working collective file should have one host name per line. Blank lines and comment lines beginning with # are ignored.
The path used when resolving the dsh command on the target nodes is the path set by the user with the DSHPATH environment variable. If DSHPATH is not set, the path used is the rsh default path, /usr/ucb:/bin:/usr/bin:. The DSHPATH environment variable only works when the user's remote login shell is the Bourne or Korn shell. An example would be to set DSHPATH to the path set on the source machine (for example, DSHPATH=$PATH).
The maximum number of concurrent rsh's can be specified with the fanout (f) flag or via the FANOUT environment variable. If desired, sequential execution can be obtained by specifying a fanout value of 1. Results are displayed as remote commands complete. All rsh's in a fanout must complete before the next set of rsh's is started. If fanout is not specified via FANOUT or the f flag, rsh's to 64 hosts are issued concurrently. Each rsh that dsh runs requires a reserved TCP/IP port and only 512 such ports are available per host. With large fanouts, it is possible to exhaust all the ports on a host, causing commands that use these ports, such as the SP rlogin and the SP rsh commands, to fail.
Exit values for the rsh commands are displayed in messages from dsh if nonzero. (A nonzero return code from rsh indicates that the rsh has failed; it has nothing to do with the exit code of the remotely executed command.) If an rsh fails, that host is removed from the current working collective (not the current working collective file), unless the c flag was set.
The dsh exit value is 0 if no errors occurred in the dsh command and all rsh's finished with exit codes of 0. The dsh exit value is more than 0 if internal errors occur or the rsh's fail. The exit value is increased by 1 for each rsh failure.
No particular error recovery for command failure on remote hosts is provided. The application or user can examine the command results in dsh's standard error and standard output, and take appropriate action.
The dsh command waits until results are in for each command for all hosts and displays those results before reading more input commands.
The dsh command does not work with interactive commands, including those that read from standard input.
The dsh command output consists of the output (standard error and standard output) of the remotely executed commands. The dsh standard output is the standard output of the remote command. The dsh standard error is the standard error of the remote command. Each line is prefixed with the host name of the host from which that output came. The host name is followed by ":" and a line of the command output.
For example, let's say that a command was issued to a working collective of host1, host2, and host3. When the command was issued on each of the hosts, the following lines were written by the remote commands:
For host1 stdout: h1out1 h1out2 For host2 stdout: h2out1 h2out2 For host3 stdout: h3out1 For host3 stderr: h3err1 h3err2 dsh stdout will be host1: h1out1 host1: h1out2 host2: h2out1 host2: h2out2 host3: h3out1 dsh stderr will be host3: h3err1 host3: h3err2
A filter to display the output lines by the host is provided separately. See the dshbak command.
If a host is detected as down (for example, an rsh returns a nonzero return code), subsequent commands are not sent to it on this invocation of dsh, unless the c (continue) option is specified on the command line.
An exclamation point at the beginning of a command line causes the command to be passed directly to the local host in the current environment. The command is not sent to the working collective.
Signals 2 (INT), 3 (QUIT), and 15 (TERM) are propagated to the remote commands.
Signals 19 (CONT), 17 (STOP), and 18 (TSTP) are defaulted. This means that the dsh command responds normally to these signals, but they do not have an effect on the remotely running commands. Other signals are caught by dsh and have their default effects on the dsh command. In the case of these other signals, all current children, and via propagation their remotely running commands, are terminated (SIGTERM).
Security considerations are the same as for the SP rsh command.
Files
Related Information
Command: dshbak
SP Commands: rsh, sysctl
Examples
WCOLL=./wchosts dsh ps
dsh -q
dsh -w otherhost1,otherhost2,otherhost3
dsh -w host1,host2,host3 -a cat /etc/passwd | dshbak
dsh -w otherhost cat remotefile '>>' otherremotefile
dsh -if 1 -a < commands_file > results 2>&1
dsh ps -ef | grep root
dsh 'ps -ef | grep root'
or
dsh ps -ef "|" grep root
dsh -w host1 cat /etc/passwd | cut -d: -f2- | cut -c2- > myetcpasswd
dsh -N my_nodes ps
Purpose
dshbak - Presents formatted output from the dsh and sysctl commands.
Syntax
dshbak [-c]
Flags
Operands
None.
Description
The dshbak command takes lines in the following format:
host_name: line of output from remote command
The dshbak command formats them as follows and writes them to standard output. Assume that the output from host_name3 and host_name4 is identical and the c option was specified:
HOSTS ----------------------------------------------------------------- host_name1 ----------------------------------------------------------------------- . . lines from dsh or sysctl with host_names stripped off . . HOSTS ----------------------------------------------------------------- host_name2 ----------------------------------------------------------------------- . . lines from dsh or sysctl with host_names stripped off . . HOSTS ----------------------------------------------------------------- host_name3 host_name4 ----------------------------------------------------------------------- . . lines from dsh or sysctl with host_names stripped off . .
When output is displayed from more than one host in collapsed form, the host names are displayed alphabetically.
When output is not collapsed, output is displayed sorted alphabetically by host name.
The dshbak command writes "." for each 1000 lines of output filtered.
Files
Related Information
Commands: dsh, sysctl
Examples
dsh -w host1,host2,host3 cat /etc/passwd | dshbak
dsh -w host1,host2,host3 pwd | dshbak -c
Purpose
Eannotator - Annotates the connection labels in the topology file.
Syntax
Eannotator -F input_file -f output_file -O [yes | no]
Flags
Operands
None.
Description
This command supports all of the following:
This command must be executed whenever a new topology file is selected.
The topology file, which describes the wiring configuration for the High Performance Switch, contains node-to-switch or switch-to-switch cable information. A node-to-switch connection looks like following:
s 25 2 tb0 17 0 E2-S00-BH-J16 to E2-N2
The predefined node-to-switch connections start with an "s" which indicates a switch connection. The next two digits, in this case "25" indicate the switch (2) and switch chip (5) being connected. The next digit, in this case "2", indicates the switch chip port in the connection. The next field, in this case "tb0", specifies the type of adapter present in the SP node. The following field, in this case "17", is the switch node number for the SP node, and the last digit, in this case "0", indicates the adapter port within the connection.
For switch-to-switch connections, the first four fields (switch indicator, switch, switch chip, and switch chip port) are repeated to identify the other end of the connection.
The connection label "E2-S00-BH-J16 to E2-N2" provides physical connection information for a customer's use to identify the bad connection.
Depending on the type of switch installed (High Performance Switch or SP Switch), together with the customer's physical switch frame configuration defined in the SDR, the Eannotator command retrieves switch node and dependent node objects from the SDR and applies proper connection information to the topology file.
If the input topology file contains existing connection information, the Eannotator command replaces the existing connection label with the new connection labels. If the input topology file does not contain connection labels, the Eannotator command appends the proper connection label to each line on the topology file.
The precoded connection labels on the topology file start with an "L" which indicate logical frames. The Eannotator command replaces the "L" character with an "E" which indicates physical frames. The "S" character indicates which slot the switch occupies in the frame, the "BH" characters indicate a Bulk Head connection, the "J" character indicates which jack provides the connection from the switch board, the "N" character indicates the node being connected to the switch, and the "SC" characters indicate the Switch Chip connection.
Files
Security
You must have root privilege to run this command.
Related Information
Commands: Eclock, Eduration, Efence, Eprimary, Equiesce, Estart, Etopology, Eunfence, Eupartition
Refer to IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment for details about system partition topology files.
Examples
Before: s 15 3 tb0 0 0 L01-S00-BH-J18 to L01-N1 After: s 15 3 tb0 0 0 E01-S17-BH-J18 to E01-N1
| Note: | Logical frame L01 is defined as physical frame 1 in the SDR Switch object. |
Before: s 10016 0 s 51 3 L09-S1-BH-J20 to L05-S00-BH-J19 After: s 10016 0 s 51 3 E10-S1-BH-J20 to E05-S17-BH-J19
| Note: | Logical frame L09 is defined as physical frame 10 in the SDR Switch object. |
Before: s 15 3 tb0 0 0 L03-S00-BH-J18 to L03-N3 After: s 15 3 tb3 0 0 E03-S17-BH-J18 to E03-N3 # Dependent Node
| Note: | Logical frame L03 is defined as physical frame 3 in the SDR Switch object and the node was determined to be a dependent node. |
Eannotator -F expected.top.8nsb.4isb.0 -f expected.top -O no
Eannotator -F expected.top.1nsb.0isb.0 -f expected.top -O yes
Purpose
Eclock - Controls the clock source for each switch board within an SP cluster.
Syntax
Flags
where:
High Performance Switch
SP Switch
If a flag is not specified, the clock input values stored in the SDR are displayed.
Operands
None.
Description
Use this command to set the multiplexors that control the clocking at each switch board within the configuration. One switch board within the configuration is designated as the "Master" switch that provides the clocking signal for all other switch boards within the configuration. The Eclock command reads clock topology information from either the file specified on the command line or the clock topology data within the SDR. If a clock topology file was specified, the Eclock command places the clock topology information into the SDR, so that it can be accessed again during a subsequent Eclock invocation. After processing the clock topology file, Eclock causes the new clock topology to take effect for the switches specified. A clock topology file contains the following information for each switch board within the cluster:
High Performance Switch
SP Switch
| High Performance Switch Warning |
|---|
|
Do not change the switch clock multiplexor settings (with Eclock, spmon command/GUI, hmcmds) while the nodes are powered on. Otherwise, AIX must be rebooted and Estart must be run following the clock adjustment. |
| SP Switch Warning |
|---|
|
Do not change the switch clock multiplexor settings (with Eclock, spmon command/GUI, hmcmds) while the nodes are powered on. Otherwise, Estart must be run following the clock adjustment. |
To execute the Eclock command, the user must be authorized to access the Hardware Monitor subsystem and, for those frames specified to the command, the user must be granted VFOP (Virtual Front Operator Panel) permission. Commands sent to frames for which the user does not have VFOP permission are ignored. Since the Hardware Monitor subsystem uses SP authentication services, the user must execute the kinit command prior to executing this command. Alternatively, site-specific procedures can be used to obtain the tokens that are otherwise obtained by kinit.
Files
Security
You must have root privilege to run this command.
Related Information
Commands: Eannotator, Eduration, Efence, Eprimary, Equiesce, Estart, Etopology, Eunfence, Eunpartition
Examples
Eclock -f /etc/SP/Eclock.top.8nsb.4isb.0
Eclock
Eclock -s 1 -m 0
Eclock -c /tmp/Eclock.top
Eclock -a /etc/SP/Eclock.top.4nsb.2isb.0
Eclock -d
Purpose
| Usage Note |
|---|
|
Do not use this command if you have the SP Switch installed on your system. |
Eduration - Sets the interval that nodes can be added or removed from the High Performance Switch. This interval is called the Run Phase Duration.
Syntax
Eduration [[days day[s]] [hours hour[s]] [minutes minute[s]] ] | [-h]
Flags
Any combination of the three preceding time designations can be used to specify the new Run Phase Duration. Since the duration determines how quickly the system can respond to Efence and Eunfence requests, it should be set to provide the desired response. If none of the time specifiers are present, Eduration will display the current value of the Run Phase Duration.
Operands
None.
Description
The Run Phase Duration controls how frequently Efence and Eunfence requests are handled. This command provides an interface to set that value.
| Note: | The Run Phase Duration changes will not take effect until the end of the current Run Phase. If you are changing the Run Phase Duration from a large value to something that is significantly smaller and you do not want to wait for the current the Run Phase to complete, you will have to Estart the switch. |
Security
You must have root privilege to run this command.
Related Information
Commands: Eannotator, Eclock, Efence, Eprimary, Equiesce, Estart, Etopology, Eunfence, Eunpartition
Examples
Eduration 1 minute
Eduration 1 hour 30 minutes
Eduration
Purpose
Efence - Removes an SP node from the current active switch network.
Syntax
Efence [-h] | [-G] [-autojoin] [node_specifier] ...
Flags
If you have an SP Switch installed on your system, such nodes are also rejoined when an Estart command is issued.
Operands
| Note: | You cannot fence the primary node on the High Performance Switch and you cannot fence either the primary or primary backup nodes on the SP Switch. |
Description
Use this command to fence a node from the current switch network.
If you have an SP Switch installed on your system, you must do either of the following to bring the node back up onto the switch network:
If you have a High Performance Switch installed on your system, you can issue the Estart command to rejoin all nodes on the switch network.
| Note: | If a host name or IP address is used as the node_specifier for a dependent node, it must be a host name or IP address assigned to the adapter that connects the dependent node to the SP Switch. Neither the administrative host name nor the Simple Network Management Protocol (SNMP) agent's host name for a dependent node is guaranteed to be the same as the host name of its switch network interface. |
Security
You must have root privilege to run this command.
Related Information
Commands: Eannotator, Eclock, Eduration, Eprimary, Equiesce, Estart, Etopology, Eunfence, Eunpartition
Examples
Efence
Efence -autojoin
Efence -G
Efence 129.33.34.1 129.33.34.6
Efence r11n01
Efence -autojoin 54 65 32 78
Efence 2,14
Efence 5 6 7 8fences nodes 5 and 6, but not nodes 7 and 8. As a result, the command returns a nonzero return code.
Efence -G 5 6 7 8
Purpose
emconditionctrl - Loads the System Data Repository (SDR) with predefined Event Management conditions.
Syntax
emconditionctrl [-a] [-s] [-k] [-d] [-c] [-t] [-o] [-r] [-h]
Flags
Operands
None.
Description
The emconditionctrl script loads the SDR with some useful conditions that can be used for registering for Event Management events. Currently the SP Perspectives application can make use of conditions.
The emconditionctrl script is not normally executed on the command line. It is normally called by the syspar_ctrl command after the control workstation has been installed or when the system is partitioned. It implements all of the flags that syspar_ctrl can pass to its subsystems, although only the -a flag causes any change to the system. The -a flag causes predefined conditions to be loaded only if run on the control workstation. It has no effect if run elsewhere.
Exit Values
Security
You must be running with an effective user ID of root.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Location
/usr/lpp/ssp/bin/emconditionctrl
Related Information
Commands: syspar_ctrl
Purpose
emonctrl - A control script that manages the Emonitor subsystem.
Syntax
emonctrl { -a | -s | -k | -d | -c | -t | -o | -r | -h }
Flags
Operands
None.
Description
The Emonitor subsystem monitors designated nodes in an attempt to maximize their availability on the switch network.
The emonctrl control script controls the operation of the Emonitor subsystem. The subsystem is under the control of the System Resource Controller (SRC) and belongs to a subsystem group called emon.
An instance of the Emonitor subsystem can execute on the control workstation for each system partition. Because Emonitor provides its services within the scope of a system partition, it is said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It should be issued from the control workstation and is not functional on the nodes.
From an operational point of view, the Emonitor subsystem group is organized as follows:
The emon group is associated with the Emonitor daemon.
On the control workstation, there are multiple instances of Emonitor, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named Emonitor.sp_prod and Emonitor.sp_test.
The Emonitor daemon provides switch node monitoring.
The emonctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.
The emonctrl script provides a variety of controls for operating the Emonitor daemon:
Before performing any of these functions, the script obtains the current system partition name and IP address (using the spget_syspar command) and the node number (using the node_number) command. If the node number is zero, the control script is running on the control workstation. Since the Emonitor daemon runs only on the control workstation, the script performs no function when run on a node.
Except for the clean function, all functions are performed within the scope of the current system partition.
Adding the Subsystem
When the -a flag is specified, the control script uses the mkssys command to add the Emonitor daemon to the SRC. The control script operates as follows:
Starting the Subsystem
This option is unused since the Emonitor daemon must be started via Estart -m.
Stopping the Subsystem
When the -k flag is specified, the control script uses the stopsrc command to stop the Emonitor daemon in the current system partition.
Deleting the Subsystem
When the -d flag is specified, the control script uses the rmssys command to remove the Emonitor subsystem from the SRC. The control script operates as follows:
Cleaning Up the Subsystems
When the -c flag is specified, the control script stops and removes the Emonitor subsystems for all system partitions from the SRC. The control script operates as follows:
Turning Tracing On
Not currently used.
Turning Tracing Off
Not currently used.
Refreshing the Subsystem
Not currently used.
Logging
While it is running, the Emonitor daemon provides information about its operation and errors by writing entries in a log file. The Emonitor daemon uses log files called /var/adm/SPlogs/css/Emonitor.log and /var/adm/SPlogs/css/Emonitor.Estart.log.
Files
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must have root privilege to run this command.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/emonctrl
Related Information
Commands: Emonitor, Estart, lssrc, startsrc, stopsrc, syspar_ctrl
Examples
emonctrl -a
emonctrl -k
emonctrl -d
emonctrl -c
lssrc -g emon
lssrc -s subsystem_name
lssrc -a
Purpose
Emonitor - Monitors nodes listed in the /etc/SP/Emonitor.cfg file in an to attempt to maximize this availability on the switch.
Syntax
Emonitor
Flags
None.
Operands
None.
Description
Emonitor is a daemon controlled by the System Resource Controller (SRC). It can be used to monitor nodes in a system partition in regard to the their status on the switch. A system-wide configuration file (/etc/SP/Emonitor.cfg) lists all nodes on the system to be monitored. The objective is to bring these nodes back up on the switch network when necessary.
Emonitor is invoked with Estart -m. Once invoked, it is controlled by SRC so it will restart if it is halted abnormally. If the you decide to end monitoring, you must run /usr/lpp/ssp/bin/emonctrl -k to stop the daemon in your system partition.
There is an Emonitor daemon for each system partition. The daemon watches for any node coming up (for example, host_responds goes from 0 to 1). When the daemon detects a node coming up, it performs a review of the nodes in the configuration file to check if any node is off the switch network. If any nodes in the specified system partition are off the switch network, it determines a way to bring them back onto the the switch (for example, via Eunfence or Estart), and takes the appropriate action. In order to avoid the Estart command from being run several times (which can occur if multiple nodes are coming up in sequence), Emonitor waits 3 minutes after a node comes up to be sure no other nodes are in the process of coming up. Each time a new node comes up prior to the 3 minute timeout, Emonitor resets the timer to a maximum wait of 12 minutes.
Emonitor cannot always bring nodes back on the switch. For example, if any of the following occur:
On a High Performance Switch, if a node is faulted off the switch and you are forced to do an Estart, you will lose history of any nodes that you had isolated off the switch. All nodes on a High Performance Switch come back on the switch on an Estart.
Problems can occur if the node that is faulted off the switch is experiencing a recurring error that causes it to come up and then fail repeatedly. The monitor continually attempts to bring this node into the switch network and could jeopardize the stability of the remaining switch network.
| Note: | Nodes that will be undergoing hardware or software maintenance should be removed from the Emonitor.cfg file during this maintenance to prevent Emonitor from attempting to to bring them onto the switch network. |
Files
Security
You must have root privilege to run this command.
Related Information
Commands: Eannotator, Eclock, Eduration, Efence, emonctrl, Eprimary, Equiesce, Estart, Etopology, Eunfence, Eupartition
Purpose
enadmin - Changes the desired state of a specified extension node.
Syntax
Flags
Operands
Description
Use this command to change the administrative state of an extension node. Setting the administrative state of an extension node to reconfigure causes configuration data for the extension node to be resent to the extension node's administrative environment. Setting the administrative state of an extension node to reset places the extension node in an initial state in which it is no longer active on the switch network.
This command is invoked internally when choosing the reconfigure option of the endefadapter and endefnode commands or the reset (-r) option of the enrmnode command.
You can use the System Management Interface Tool (SMIT) to run this command by selecting the Extension Node Management panel. To use SMIT, enter:
smit manage_extnode
Standard Output
All informational messages are written to standard output. These messages identify the extension node being changed and indicate when the specified state change has been accepted for processing by the extension node agent (at which point the command is complete). All error messages are also written to standard output.
Exit Values
Security
You must have root privilege to run this command or be a member of the system group.
Restrictions
This command can only be issued on the control workstation.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.spmgr file set.
The spmgrd SNMP manager daemon on the SP control workstation allows transfer of extension node configuration data from the SP system to an SNMP agent providing administrative support for the extension node. Version 1 of the SNMP protocol is used for communication between the SNMP manager and the SNMP agent. Limited control of an extension node is also possible. An SNMP set-request message containing an object instantiation representing the requested administrative state for the extension node is sent from the SNMP manager to the SNMP agent providing administrative support for the extension node. After the administrative state of an extension node is received by the SNMP agent, the enadmin command is completed. Requests for configuration information and information about the state of an extension node are sent to the SNMP manager asynchronously in SNMP trap messages.
Prerequisite Information
IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment
Location
/usr/lpp/ssp/bin/enadmin
Related Information
Commands: endefadapter, endefnode, enrmadapter, enrmnode, spmgrd
Examples
enadmin -a reconfigure 9
enadmin -a reset 9
Purpose
endefadapter - Adds new or changes existing configuration data for an extension node adapter in the System Data Repository (SDR) and optionally performs the reconfiguration request.
Syntax
endefadapter [-a address] [-h] [-m netmask] [-r] node_number
Flags
Operands
Description
Use this command to define extension node adapter information in the SDR. The -a and -m flags and the node_number operand are required.
You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:
smit enter_extadapter
Environment Variables
The SP_NAME environment variable is used (if set) to direct this command to a system partition. If the SP_NAME environment variable is not set, the default system partition will be used.
Standard Output
This command writes informational messages to standard output.
Standard Error
This command writes all error messages to standard error.
Exit Values
Security
You must have root privilege to run this command or be a member of the system group.
Restrictions
This command can only be issued on the control workstation.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.basic file set.
Prerequisite Information
IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment
Location
/usr/lpp/ssp/bin/endefadapter
Related Information
Commands: enadmin, endefnode, enrmadapter, enrmnode
Examples
endefadapter -a 129.40.158.137 -m 255.255.255.0 10
endefadapter -a 129.40.158.137 -m 255.255.255.0 -r 10
Purpose
endefnode - Adds new or changes existing configuration data for an extension node in the System Data Repository (SDR) and optionally performs the reconfiguration request.
Syntax
Flags
Operands
Description
Use this command to define extension node information in the SDR. When adding a new extension node, the -a, -i, and -s flags and the node_number operand are required. When changing an existing extension node definition, only the node number is required along with the flag corresponding to the field being changed.
You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:
smit enter_extnode
Environment Variables
The SP_NAME environment variable is used (if set) to direct this command to a system partition. If the SP_NAME environment variable is not set, the default system partition will be used.
Standard Output
This command writes informational messages to standard output.
Standard Error
This command writes all error messages to standard error.
Exit Values
Security
You must have root privilege to run this command or be a member of the system group.
Restrictions
This command can only be issued on the control workstation.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.basic file set.
Prerequisite Information
IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment
Location
/usr/lpp/ssp/bin/endefnode
Related Information
Commands: enadmin, endefadapter, enrmnode, enrmadapter
Refer to the SP Switch Router Adapter Guide for information about attaching an IP router extension node to the SP Switch.
Examples
endefnode -i 13 -a router1 -s router1 -c spenmgmt 2
endefnode -i 02 -a grf.pok.ibm.com -s grf.pok.ibm.com -c spenmgmt -r 7
Purpose
enrmadapter - Removes configuration data for an extension node adapter from the System Data Repository (SDR).
Syntax
enrmadapter [-h] node_number
Flags
Operands
Description
Use this command to remove extension node adapter information from the SDR. The node_number operand is required.
You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:
smit delete_extadapter
Environment Variables
The environment variable SP_NAME is used (if set) to direct this command to a system partition. If the SP_NAME environment variable is not set, the default system partition will be used.
Standard Output
This command writes informational messages to standard output.
Standard Error
This command writes all error messages to standard error.
Exit Values
Security
You must have root privilege to run this command or be a member of the system group.
Restrictions
This command can only be issued on the control workstation.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.basic file set.
Prerequisite Information
IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment
Location
/usr/lpp/ssp/bin/enrmadapter
Related Information
Commands: enadmin, endefadapter, endefnode, enrmnode
Examples
To remove an extension node adapter with a node number of 12 from the SDR, enter:
enrmadapter 12
Purpose
enrmnode - Removes configuration data for an extension node in the System Data Repository (SDR).
Syntax
enrmnode [-h] [-r] node_number
Flags
Operands
Description
Use this command to remove extension node information from the SDR. When removing information, the node_number operand is required.
You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:
smit delete_extnode
Environment Variables
The environment variable SP_NAME is used (if set) to direct this command to a system partition. If the SP_NAME environment variable is not set, the default system partition will be used.
Standard Output
This command writes informational messages to standard output.
Standard Error
This command writes all error messages to standard error.
Exit Values
Security
You must have root privilege to run this command or be a member of the system group.
Restrictions
This command can only be issued on the control workstation.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.basic file set.
Prerequisite Information
IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment
Location
/usr/lpp/ssp/bin/enrmnode
Related Information
Commands: enadmin, endefadapter, endefnode, enrmadapter
Examples
To remove an extension node with a node number of 2 from the SDR and reset that extension node, enter:
enrmnode -r 2
Purpose
Eprimary - Assigns or queries the switch primary node and switch primary backup node for a system partition.
***High Performance Switch***
Syntax
Eprimary [-h] [-init] [node_identifier]
Flags
Operands
| Note: | If no flags or operands are specified, the current switch primary node is displayed. |
Description
Use this command to assign, change, or query the switch primary node. When the -init option is specified, it can be used to create a switch partition object for a system partition. The primary node should not be changed unless the current primary node is becoming unavailable (for example, if the current primary node is to be serviced). The Estart command must be issued before a change of the primary node (using Eprimary) takes effect. The old primary node must be rebooted or powered off before issuing Estart to remove its inclination to behave as the primary node.
Security
You must have root privilege to run this command.
Related Information
Commands: Eannotator, Eclock, Eduration, Efence, Equiesce, Estart, Etopology, Eunfence
Examples
Eprimary
Eprimary 129.33.34.1
Eprimary 1
Eprimary r11n01
Eprimary -init 1,2
***SP Switch***
Syntax
Eprimary [-h] [-init] [node_identifier] [-backup bnode_identifier]
Flags
Operands
| Note: |
If no flags or operands are specified, each of the following is
displayed:
|
Description
Use this command to assign, change, or query the switch primary node or the switch primary backup node. The primary node should not be changed unless the current primary node is becoming unavailable (for example, if the current primary node is to be serviced). The Estart command must be issued before a change of the primary node or the primary backup node (using Eprimary) takes effect.
In an SP Switch network, the primary node takeover facility automatically handles situations (such as a node loss) for each of the primary and primary backup nodes. The primary node replaces a failing primary backup node and the primary backup node automatically takes over for the primary node if the primary node becomes unavailable. Note that the node chosen cannot be a dependent node. The primary backup node should be selected using the following guidelines:
The Eprimary command selects a default oncoming primary or oncoming backup primary node if one is not specified. Users receive a warning in the following situations on the oncoming primary or oncoming backup primary nodes:
Security
You must have root privilege to run this command.
Related Information
Commands: Eannotator, Eclock, Eduration, Efence, Equiesce, Estart, Etopology, Eunfence, Eunpartition
Examples
Eprimary
Eprimary 129.33.34.1
Eprimary 129.33.34.1 -backup 129.33.34.56
Eprimary r11n01 -backup r17n02
Eprimary -init 1,2 -backup 1,6
Purpose
| Usage Note |
|---|
|
Use this command only if you have an SP Switch installed on your system. |
Equiesce - Quiesces the switch by causing the primary and primary backup nodes to shut down switch recovery and primary node takeover.
Syntax
Equiesce [-h]
Flags
Operands
None.
Description
Use this command to disable switch error recovery and primary node takeover. It is used to shut down normal switch error actions when global activities affecting nodes are performed. For example, when all nodes are shutdown or rebooted, they are fenced from the switch by the primary node.
If the primary node is not the first node to shut down during a global shutdown or reboot of the entire system, it may fence all the other nodes including the primary backup node. Primary node takeover can also occur if the primary node is shut down and the backup node remains up. Issuing the Equiesce command before the shutdown prevents these situations from occurring.
The Equiesce command causes the primary and primary backup nodes to shut down their recovery actions. Data still flows over the switch, but no faults are serviced and primary node takeover is disabled. Only the Eannotator, Eclock, Eprimary, Estart, and Etopology commands are functional after the Equiesce command is issued.
Estart must be issued when the global activity is complete to reestablish switch recovery and primary node takeover.
Security
You must have root privilege to run this command.
Location
/usr/lpp/ssp/bin/Equiesce
Related Information
Commands: Eannotator, Eclock, Efence, Eprimary, Estart, Etopology, Eunfence, Eunpartition
Examples
To quiesce the switch before shutting down the system, enter:
Equiesce
Purpose
Syntax
Estart [-h] [-m]
Flags
Operands
None.
Description
Use this command to start or restart the current system partition based on its switch topology file. (Refer to the Etopology command for topology file details.) If the -m flag is specified, it will also start the Emonitor daemon to monitor nodes on the switch. Refer to the Emonitor daemon for additional information. If the Estart command is issued when the switch is already running, it causes a switch fault, and messages in flight are lost. Applications using reliable protocols on the switch, such as TCP/IP and the MPI User Space library, recover from switch faults. Applications using unreliable protocols on the switch do not recover from switch faults. For this reason, IBM suggests that you should be aware of what applications or protocols you are running before you issue the Estart command. Since the Estart command uses the SP rsh command, proper authentication and authorization to issue this command is necessary.
SP Switch Notes:
If you have an SP Switch installed on your system, an oncoming primary node as selected via Eprimary is established as primary during Estart. If necessary, the topology file is distributed to partition nodes during Estart. The topology file to be used is distributed to each of the standard nodes in the system partition via the SP Ethernet:
Otherwise, the topology file is already resident on the nodes and does not need to be distributed.
Files
Security
You must have root privilege to run this command.
Related Information
Commands: Eannotator, Eclock, Eduration, Efence, Eprimary, Equiesce, Etopology, Eunfence, Eunpartition
Refer to IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment for details about system partition topology files.
Examples
Estart
Estart -m
Purpose
Etopology - Stores or reads a switch topology file into or out of the System Data Repository (SDR).
Syntax
Etopology [-h] [-read] switch_topology_file
Flags
Operands
Description
Use this command to store or retrieve the switch_topology_file into or out of the SDR. The switch topology file is used by switch initialization when starting the switch for the current system partition. It is stored in the SDR and can be overridden by having a switch topology file in the /etc/SP directory named expected.top on the switch primary node.
If you have an SP Switch installed on your system, the current topology file is copied to each node of the subject system partition during an Estart and to each targeted node for an Eunfence.
Files
Security
You must have root privilege to run this command.
Related Information
Commands: Eannotator, Eclock, Eduration, Efence, Eprimary, Equiesce, Estart, Eunfence, Eupartition
Refer to the IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment for information on system partition configurations and topology files.
Examples
Etopology /etc/SP/expected.top.6nsb.4isb.0
Etopology /etc/SP/expected.top.1nsb.0isb.0
Etopology -read /tmp/temporary.top
Purpose
Eunfence - Adds an SP node to the current active switch network that was previously removed from the network.
Syntax
Eunfence [-h | [-G] node_specifier [node_specifier2] ...
Flags
Operands
Description
Use this command to allow a node to rejoin the current switch network that was previously removed with the Efence command.
You can also use this command to allow a node to rejoin the switch network if that node was previously removed from the SP Switch network due to a switch or adapter error.
SP Switch Note:
Eunfence first distributes the current topology file to the nodes before they can be unfenced.
High Performance Switch Note:
The Eunfence command cannot unfence a fenced node if a switch fault occurred or if Estart ran after the node was fenced. You must do another Estart to unfence the node.
| Note: | If a host name or IP address is used as the node_specifier for a dependent node, it must be a host name or IP address assigned to the adapter that connects the dependent node to the SP Switch. Neither the administrative host name nor the Simple Network Management Protocol (SNMP) agent's host name for a dependent node is guaranteed to be the same as the host name of its switch network interface. |
Files
Security
You must have root privilege to run this command.
Related Information
Commands: Eannotator, Eclock, Eduration, Efence, Eprimary, Equiesce, Estart, Etopology, Eunpartition
Examples
Eunfence 129.33.34.1
Eunfence r11n01 r11n04
Eunfence 34 43 20 76 40
Eunfence 2,14
Eunfence 5 6 7 8unfences nodes 5 and 6, but not nodes 7 and 8. As a result, the command returns a nonzero return code.
Eunfence -G 5 6 7 8
Purpose
| Usage Note |
|---|
|
Use this command only if you have an SP Switch installed on your system. |
Eunpartition - Prepares a system partition for merging with a neighboring system partition.
Syntax
Eunpartition [-h]
Flags
If a flag is not specified, Eunpartition examines the SP_NAME shell variable and selects a system partition based on its current setting.
Operands
None.
Description
Use this command to prepare a partitioned configuration for a new system partition definition within an SP cluster.
This command must be executed for each system partition prior to the spapply_config command to redefine system partitions. Since this command uses the SP rsh command, proper authentication and authorization to issue this command is required.
If you specify Eunpartition in error, it will quiesce the primary and primary backup nodes. If this occurs, you must use Estart to restart the switch.
Security
You must have root privilege to run this command.
Related Information
Commands: Eannotator, Eclock, Eduration, Efence, Eprimary, Equiesce, Estart, Etopology, Eunfence
Examples
To prepare the current system partition for repartitioning as specified by SP_NAME, enter:
Eunpartition
Purpose
export_clients - Creates or updates the Network File System (NFS) export list for a boot/install server.
Syntax
export_clients [-h]
Flags
Operands
None.
Description
Use this command to create or update the NFS export list on a boot/install server node.
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must have root privilege to run this command.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Location
/usr/lpp/ssp/bin/export_clients
Related Information
Commands: setup_server
Examples
To create or update the NFS export list on a boot/install server node, enter:
export_clients
Purpose
ext_srvtab - Extracts service key files from the authentication database.
Syntax
ext_srvtab [-n] [-r realm] [instance ...]
Flags
Operands
Description
The ext_srvtab command extracts service key files from the authentication database. The master key is used to extract service key values from the database. For each instance specified on the command line, the ext_srvtab command creates a new service key file in the current working directory with a file name of instance-new-srvtab which contains all the entries in the database with an instance field of instance. This new file contains all the keys registered for instances of services defined to run on that host. A user must have read access to the authentication database to execute this command. This command can only be issued on the system on which the authentication database resides.
Files
Related Information
Commands: kadmin, ksrvutil
Refer to Chapter 2, "RS/6000 SP Files and Other Technical Information" section of IBM Parallel System Support Programs for AIX: Command and Technical Reference for additional Kerberos information.
Examples
If a system has three network interfaces named as follows:
ws3e.abc.org ws3t.abc.org ws3f.finet.abc.orgto re-create the server key file on this workstation (that is an SP authentication server), user root could do the following:
# create a new key file in the /tmp directory for each instance # Combine the instance files into a single file for the hostname. # Delete temporary files and protect key file cd /tmp /usr/kerberos/etc/ext_srvtab -n ws3e ws3t ws3f /bin/cat ws3e-new-srvtab ws3t-new-srvtab ws3f-new-srvtab \ >/etc/krb-srvtab /bin/rm ws3e-new-srvtab ws3t-new-srvtab ws3f-new-srvtab /bin/chmod 400 /etc/krb-srvtab
Purpose
fencevsd - Prevents an application running on a node or group of nodes from accessing an IBM Virtual Shared Disk or group of IBM Virtual Shared Disks.
Syntax
fencevsd -v vsd_name_list -n node_list
Flags
Operands
None.
Description
Under some circumstances, the system may believe a node has failed and begin recovery procedures, when the node is actually operational, but cut off from communication with other nodes running the same application. In this case, the "failed" node must not be allowed to serve requests for the IBM Virtual Shared Disks it normally serves until recovery is complete and the other nodes running the application recognize the failed node as operational. The fencevsd command prevents the failed node from filling requests for its IBM Virtual Shared Disks.
This command can be run from any node.
| Note: | This command will fail if you do not specify a current server (primary or backup) to an IBM Virtual Shared Disk with the -v flag. |
Files
Security
You must have root privilege to run this command.
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: lsfencevsd, lsvsd, unfencevsd, updatevsdtab, vsdchgserver
Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for information on how to use this command in writing applications.
Examples
To fence the IBM Virtual Shared Disks vsd1 and vsd2 from node 5, enter:
fencevsd -v vsd1,vsd2 -n 5
Purpose
get_vpd - Consolidates the Vital Product Data (VPD) files for the nodes and writes the information to a file and optionally to a diskette.
Syntax
get_vpd [-h] [-d] -m model_number -s serial_number
Flags
Description
Use this command to consolidate the Vital Product Data (VPD) for the nodes in the RS/6000 SP into a file and to optionally write the file to diskette. The diskette created by this command is sent to IBM manufacturing when an upgrade to the RS/6000 SP hardware is desired. This diskette is used by manufacturing and marketing to configure an upgrade of the RS/6000 SP.
The get_vpd command is issued by IBM field personnel to capture VPD information after an upgrade of the system. All installation and configuration of the RS/6000 SP must be complete prior to issuing the get_vpd command.
Files
Standard Output
This command creates the /var/adm/SPlogs/SPconfig/serial_number.vpd file and optionally writes the file to a diskette.
Standard Error
This command writes all error messages to standard error.
Exit Values
Security
You must have root privilege to run this command.
Restrictions
This command can only be issued on the control workstation.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.basic file set.
Prerequisite Information
IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment
Location
/usr/lpp/ssp/install/bin/get_vpd
Examples
get_vpd -m 204 -s 020077650
get_vpd -m 306 -s 510077730 -d
Purpose
ha_vsd - Starts the rvsd subsystem of IBM Recoverable Virtual Shared Disk (RVSD). This includes configuring IBM Virtual Shared Disks and data striping devices (HSDs) as well as starting the rvsd and hc daemons.
Syntax
ha_vsd [reset]
Flags
None.
Operands
Description
Use this command to start the IBM Recoverable Virtual Shared Disk licensed program after you install it, or, with the reset option, to stop and restart the program.
Exit Values
Security
You must have root privilege to issue the ha_vsd subcommand.
Implementation Specifics
This command is part of the IBM Recoverable Virtual Shared Disk Licensed Program Product (LPP).
Prerequisite Information
See "Using the IBM Recoverable Virtual Shared Disk Software" in IBM Parallel System Support Programs for AIX: Managing Shared Disks.
Location
/usr/lpp/csd/bin/ha_vsd
Related Information
Commands: ha.vsd, hc.vsd
Examples
To stop the rvsd subsystem and restart it, enter:
ha_vsd reset
The system returns the messages:
Starting rvsd subsystem. rvsd subsystem started PID=xxx.
Purpose
ha.vsd - Queries and controls the rvsd subsystem of IBM Recoverable Virtual Shared Disk (RVSD).
Syntax
Flags
None.
Operands
The rvsd subsystem must be restarted for this operand to take effect.
The RVSD subsystem must be restarted for this operand to take effect.
Once debugging is turned on and the RVSD subsystem has been restarted, ha.vsd trace should be issued to turn on tracing.
Use this operand under the direction of your IBM service representative.
Note: the default when the node is booted is to have stdout and stderr routed to the console. If debugging is turned off stdout and stderr will be routed to /dev/null and all further trace messages will be lost. You can determine if debug has been turned on by issuing ha.vsd qsrc. If debug has been turned on the return value will be:
action = "2"
This operand is only meaningful after the debug operand has been used to send stdout and stderr to the console and the rvsd subsystem has been restarted.
Description
Use this command to display information about the rvsd subsystem, to change the number of nodes needed for quorum, and to change the status of the subsystem.
You can start the rvsd subsystem with the VSD Perspective. Type spvsd and select actions for IBM VSD nodes.
Exit Values
Security
You must have root privilege to issue the debug, quorum, refresh, reset, start, stop, trace, mksrc, and rmsrc subcommands.
Implementation Specifics
This command is part of the IBM Recoverable Virtual Shared Disk Licensed Program Product (LPP).
Prerequisite Information
See "Using the IBM Recoverable Virtual Shared Disk Software" in IBM Parallel System Support Programs for AIX: Managing Shared Disks.
Location
/usr/lpp/csd/bin/ha.vsd
Related Information
Commands: ha_vsd, hc.vsd
Examples
ha.vsd reset
The system returns the messages:
Waiting for the rvsd subsystem to exit. rvsd subsystem exited successfully. Starting rvsd subsystem. rvsd subsystem started PID=xxx.
ha.vsd quorum 5
The system returns the message:
Quorum has been changed from 8 to 5.
Purpose
hacws_verify - Verifies the configuration of both the primary and backup High Availability Control Workstation (HACWS) control workstations.
Syntax
hacws_verify
Flags
None.
Operands
None.
Description
Use this command to verify that the primary and backup control workstations are properly configured to provide HACWS services to the SP system. The hacws_verify command inspects both the primary and backup control workstations to verify the following:
Both the primary and backup control workstations must be running and capable of executing remote commands via the /usr/lpp/ssp/rcmd/bin/rsh command.
The system administrator should run the hacws_verify command after HACWS is initially configured. After that, the hacws_verify command can be run at any time.
Exit Values
Prerequisite Information
Refer to IBM Parallel System Support Programs for AIX: Administration Guide for additional information on the HACWS option.
Location
/usr/sbin/hacws/hacws_verify
Related Information
SP Commands: install_hacws, rsh, spcw_addevents
Purpose
haemcfg - Compiles the Event Management objects in the System Data Repository (SDR) and places the compiled information into a binary Event Management Configuration Database (EMCDB) file
Syntax
haemcfg [-c] [-n]
Flags
Operands
None.
Description
The haemcfg utility command builds the Event Management Configuration Database (EMCDB) file for a system partition. If no flags are specified, the haemcfg command:
To place the new EMCDB into production, you must shut down and restart all of this system partition's Event Manager daemons: the daemon on the control workstation and the daemon on each of the system partition's nodes. When the Event Management daemon restarts, it copies the EMCDB from the staging directory to the production directory. The name of the production EMCDB is /etc/ha/cfg/em.syspar_name.cdb.
If you want to test a new EMCDB, IBM recommends that you create a separate system partition for that purpose.
You must create a distinct EMCDB file for each system partition on the IBM RS/6000 SP. To build an EMCDB file, you must be executing on the control workstation and you must set the SP_NAME environment variable to the appropriate system partition name before you issue the command.
Before you build or replace an EMCDB, it is advisable to issue the haemcfg command with the debugging flags.
The -c flag lets you check the validity of the Event Management data that resides in the SDR. This data was previously loaded through the haemloadcfg command. If any of the data is invalid, the command writes an error message that describes the error.
When the -c flag is processed, the command validates the data in the SDR, but does not create a new EMCDB file and does not update the EMCDB version string in the SDR.
The -n flag lets you build a test EMCDB file in the current directory. If anything goes wrong with the creation of the new file, the command writes an error message that describes the error.
When the -n flag is processed, the command uses the data in the SDR to create a test EMCDB file in the current directory, but it does not update the EMCDB version string in the SDR. If any of the data in the SDR is invalid, the command stops at the first error encountered.
If you specify both flags on the command line, the haemcfg command performs the actions of the -c flag.
After you have checked the data and the build process, issue the haemcfg command without any flags. This builds the new EMCDB file, places it in the /spdata/sys1/ha/cfg directory, and updates the EMCDB version string in the SDR.
Files
Standard Output
When the command executes successfully, it writes the following informational messages:
Reading Event Management data for partition syspar_name CDB=new_EMCDB_file_name Version=EMCDB_version_string
Standard Error
This command writes error messages (as necessary) to standard error.
Errors can result from causes that include:
For a listing of the errors that the haemcfg command can produce, see IBM Parallel System Support Programs for AIX: Diagnosis and Messages Guide.
Exit Values
Security
To place an EMCDB file for a system partition into the /spdata/sys1/ha/cfg directory, you must be running with an effective user ID of root on the control workstation. Before running this command, you must set the SP_NAME environment variable to the appropriate system partition name.
Restrictions
To place an EMCDB file for a system partition into the /spdata/sys1/ha/cfg directory, you must be running with an effective user ID of root on the control workstation. Before running this command, you must set the SP_NAME environment variable to the appropriate system partition name.
If you run the haemcfg command without any flags, the command stops at the first error it encounters. With the -c flag on, the command continues, letting you obtain as much debugging information as possible in one pass. To reduce your debugging time, therefore, run the command with the debugging flags first.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
For a general overview of configuring Event Management, see "The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide.
For a description of the SDR classes and attributes that are related to the EMCDB, see IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference.
Location
/usr/lpp/ssp/bin/haemcfg
Related Information
Commands: haemloadcfg
Examples
haemcfg -c
If there are any errors in the data, the command writes appropriate error messages.
To fix the errors, replace the data in the SDR. For more information, see the man page for the haemloadcfg command.
haemcfg -n
If there are any problems in creating the file, the command writes appropriate error messages.
haemcfg
In response, the command creates a new EMCDB file, places it in the staging directory as /spdata/sys1/ha/cfg/em.syspar_name.cdb, where syspar_name is the name of the current system partition, and updates the EMCDB version string in the SDR.
Purpose
haemctrl - A control script that starts the Event Management subsystem.
Syntax
haemctrl {-a | -s | -k | -d | -c | -u | -t | -o | -r | -h}
Flags
Operands
None.
Description
Event Management is a distributed subsystem of PSSP that provides a set of high availability services for the IBM RS/6000 SP. By matching information about the state of system resources with information about resource conditions that are of interest to client programs, it creates events. Client programs can use events to detect and recover from system failures, thus enhancing the availability of the SP system.
The haemctrl control script controls the operation of the Event Management subsystem. The subsystem is under the control of the System Resource Controller (SRC) and belongs to a subsystem group called haem. Associated with each subsystem is a daemon.
An instance of the Event Management subsystem executes on the control workstation and on every node of a system partition. Because Event Management provides its services within the scope of a system partition, its subsystem is said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It can be issued from either the control workstation or any of the system partition's nodes.
From an operational point of view, the Event Management subsystem group is organized as follows:
The haem subsystem is associated with the haemd daemon.
The subsystem name on the nodes is haem. There is one of each subsystem per node and it is associated with the system partition to which the node belongs.
On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named haem.sp_prod and haem.sp_test.
The haemd daemon provides the Event Management services.
The haemctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.
The haemctrl script provides a variety of controls for operating the Event Management subsystem:
Before performing any of these functions, the script obtains the current system partition name and IP address (using the spget_syspar command) and the node number (using the node_number) command. If the node number is zero, the control script is running on the control workstation.
Except for the clean and unconfigure functions, all functions are performed within the scope of the current system partition.
Adding the Subsystem
When the -a flag is specified, the control script uses the mkssys command to add the Event Management subsystem to the SRC. The control script operates as follows:
The service name that is entered in the /etc/services file is haem.syspar_name.
For more information about configuring Event Management data, see the IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference.
Then it gets the port number for the subsystem from the SP_ports class of the System Data Repository (SDR) and ensures that the port number is set in the /etc/services file. This port number is used for remote connections to Event Management daemons that are running on the control workstation. If there is no port number in the SDR, the script obtains one and sets it in the /etc/services file. The range of valid port numbers is 10000 to 10100, inclusive.
The service name is haemd.
Starting the Subsystem
When the -s flag is specified, the control script uses the startsrc command to start the Event Management subsystem, haem.
Stopping the Subsystem
When the -k flag is specified, the control script uses the stopsrc command to stop the Event Management subsystem, haem.
Deleting the Subsystem
When the -d flag is specified, the control script uses the rmssys command to remove the Event Management subsystem from the SRC. The control script operates as follows:
Cleaning Up the Subsystems
When the -c flag is specified, the control script stops and removes the Event Management subsystems for all system partitions from the SRC. The control script operates as follows:
Unconfiguring the Subsystems
When the -u flag is specified, the control script performs the function of the -c flag in all system partitions and then removes all port numbers from the SDR allocated by the Event Management subsystems.
| Note: | The -u flag is effective only on the control workstation. |
Prior to executing the haemctrl command with the -u flag on the control workstation, the haemctrl command with the -c flag must be executed from all of the nodes. If this subsystem is not successfully cleaned from all of the nodes, different port numbers may be used by this subsystem, leading to undefined behavior.
Turning Tracing On
When the -t flag is specified, the control script turns tracing on for the haemd daemon, using the haemtrcon command.
Turning Tracing Off
When the -o flag is specified, the control script turns tracing off for the haemd daemon, using the haemtrcoff command.
Refreshing the Subsystem
The -r flag has no effect for this subsystem.
Logging
While it is running, the Event Management daemon normally provides information about its operation and errors by writing entries to the AIX error log. If it cannot, errors are written to a log file called /var/ha/log/em.default.syspar_name.
Files
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must be running with an effective user ID of root.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
"The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide
IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/haemctrl
Related Information
Commands: haemcfg, haemd, haemloadcfg, haemtrcoff, haemtrcon, lssrc, startsrc, stopsrc, syspar_ctrl
Examples
haemctrl -a
haemctrl -s
haemctrl -k
haemctrl -d
haemctrl -c
haemctrl -u
haemctrl -t
haemctrl -o
lssrc -g haem
lssrc -s haem
To display the status of an individual Event Management subsystem on the control workstation, enter:
lssrc -s haem. syspar_name
where syspar_name is the system partition name.
lssrc -l -s haem
To display detailed status about an individual Event Management subsystem on the control workstation, enter:
lssrc -l -s haem.syspar_name
where syspar_name is the system partition name.
In response, the system returns information that includes the running status of the subsystem, the settings of trace flags, the version number of the Event Management Configuration Database, the time the subsystem was started, the connection status to Group Services and peer Event Management subsystem, and the connection status to Event Management clients, if any.
lssrc -a
Purpose
haemd - The Event Manager daemon, which observes resource variable instances that are updated by Resource Monitors and generates and reports events to client programs
Syntax
haemd [syspar_IPaddr]
Flags
None.
Operands
Description
The haemd daemon is the Event Manager daemon. The daemon observes resource variable instances that are updated by Resource Monitors and generates and reports events to client programs.
One instance of the haemd daemon executes on the control workstation for each system partition. An instance of the haemd daemon also executes on every node of a system partition. The haemd daemon is under System Resource Controller (SRC) control.
Because the daemon is under SRC control, it cannot be started directly from the command line. It is normally started by the haemctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system. If you must start or stop the daemon directly, use the haemctrl command.
For more information about the Event Manager daemon, see the haemctrl man page.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
"The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide
IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/haemd
Related Information
Commands: haemctrl
Examples
See the haemctrl command.
Purpose
haemloadcfg - Loads Event Management configuration data into the System Data Repository (SDR)
Syntax
haemloadcfg [-d] [-r] loadlist_file
Flags
Operands
Description
The haemloadcfg utility command loads Event Management configuration data into the SDR. Note that before you invoke haemloadcfg, you must ensure that the SP_NAME environment variable is set to the appropriate system partition name.
The configuration data is contained in a load list file, whose format is described by the man page for the haemloadlist file. For details on the SDR classes and attributes that you can use to specify Event Management configuration data, see IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference.
To load the default Event Management configuration data for PSSP, specify the load list file as /usr/lpp/ssp/install/config/haemloadlist.
To add Event Management configuration data for other Resource Monitors, create a file in load list format and specify its name on the command.
Without any flags, the haemloadcfg command does not replace existing objects in the SDR. The data in the load list file is matched with the existing objects in the SDR based on key attributes, as follows:
Note that the way in which the haemloadcfg command handles existing SDR objects is different from the way in which the SDRCreateObjects command handles them. The SDRCreateObjects command creates a new object as long as the attributes, taken as a group, are unique.
To change a nonkey attribute of an Event Management object that already exists in the SDR, change the attribute in the load list file. Then run the haemloadcfg command using the -r flag and the name of the load list file. All objects in the SDR are replaced by matching objects in the load list file using the key attributes to match. Any unmatched objects in the load list file are added to the SDR.
To delete Event Management objects from the SDR, create a load list file with the objects to be deleted. Only the key attributes need to be specified. Then run the haemloadcfg command using the -d flag and the name of the load list file. All objects in the SDR that match objects in the load list file are deleted. No unmatched objects, if any in the load list file, are added to the SDR.
Under any circumstances, duplicate objects in the load list file, based on matches in key attributes, are ignored. However, such duplicate objects are written to standard output.
Files
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must have the appropriate authority to write to the SDR. You should be running on the control workstation. Before running this command, you must set the SP_NAME environment variable to the appropriate system partition name.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
For a general overview of configuring Event Management, see "The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide.
For details on the System Data Repository classes and attributes for Event Management configuration Database, see IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference.
Location
/usr/lpp/ssp/bin/haemloadcfg
Related Information
Commands: haemcfg, SDRCreateObjects, SDRDeleteObjects
Files: haemloadlist
Also, for a description of the SDR classes for Event Management configuration data, see IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference.
Examples
haemloadcfg /usr/lpp/ssp/install/config/haemloadlist
haemloadcfg /usr/local/config/newrmloadlist
If nonkey attributes in this load list file are later changed, update the SDR by entering:
haemloadcfg -r /usr/local/config/newrmloadlist
If this new Resource Monitor is no longer needed, its configuration data is removed from the SDR by entering:
haemloadcfg -d /usr/local/config/newrmloadlist
Purpose
haemtrcoff - Turns tracing off for the Event Manager daemon.
Syntax
haemtrcoff -s subsys_name -a trace_list
Flags
Operands
The following trace arguments may be specified:
Description
The haemtrcoff command is used to turn tracing off for specified activities of the Event Manager daemon. Trace output is placed in an Event Management trace log for the system partition.
Use this command only under the direction of the IBM Support Center. It provides information for debugging purposes and may degrade the performance of the Event Management subsystem or anything else that is running in the system partition. Do not use this command during normal operation.
Files
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
"The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide
Location
/usr/lpp/ssp/bin/haemtrcoff
Related Information
Commands: haemctrl, haemd, haemtrcon
Examples
In the following examples, the SP system has two system partitions named sp_prod and sp_test. The instances of the Event Management subsystem on the control workstation of the SP are named haem.sp_prod and haem.sp_test, respectively. The instance of the Event Management subsystem that runs on any node of either system partition is named haem.
haemtrcoff -s haem.sp_prod -a all
haemtrcoff -s haem -a all
haemtrcoff -s haem.sp_test -a init,config
Purpose
haemtrcon - Turns tracing on for the Event Manager daemon.
Syntax
haemtrcon -s subsys_name -a trace_list
Flags
Operands
The following trace arguments may be specified:
Description
The haemtrcon command is used to turn tracing on for specified activities of the Event Manager daemon. Trace output is placed in an Event Management trace log for the system partition. When used, the regs and dinsts arguments perform a one-time trace. The specified information is placed in the trace log, but no further tracing is done.
Use this command only under the direction of the IBM Support Center. It provides information for debugging purposes and may degrade the performance of the Event Management subsystem or anything else that is running in the system partition. Do not use this command to turn tracing on during normal operation.
Files
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
"The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide
Location
/usr/lpp/ssp/bin/haemtrcon
Related Information
Commands: haemctrl, haemd, haemtrcoff
Examples
In the following examples, the SP system has two system partitions named sp_prod and sp_test. The instances of the Event Management subsystem on the control workstation of the SP are named haem.sp_prod and haem.sp_test, respectively. The instance of the Event Management subsystem that runs on any node of either system partition is named haem.
haemtrcon -s haem.sp_prod -a all
haemtrcon -s haem -a all
haemtrcon -s haem.sp_test -a init,config
Purpose
haemunlkrm - Unlocks and starts a Resource Monitor.
Syntax
haemunlkrm -s subsys_name -a resmon_name
Flags
Description
If the Event Manager daemon cannot successfully start a Resource Monitor after three attempts within a two hour interval, the Resource Monitor is "locked" and no further attempts are made to start it. Once the cause of the failure is determined and the problem corrected, the haemunlkrm command can be used to unlock the Resource Monitor and attempt to start it.
The status of the Event Manager daemon, as displayed by the lssrc command, indicates if a Resource Monitor is locked.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
"The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide
Location
/usr/lpp/ssp/bin/haemunlkrm
Examples
If the output of the lssrc command indicates that the hardware Resource Monitor IBM.PSSP.hmrmd is locked, then after correcting the condition that prevented the Resource Monitor from being started, enter:
haemunlkrm -s haem -a IBM.PSSP.hmrmd
| Note: | This example applies to unlocking a Resource Monitor on a node. |
Purpose
hagsctrl - A control script that starts the Group Services subsystems.
Syntax
hagsctrl {-a | -s | -k | -d | -c | -u | -t | -o | -r | -h}
Flags
Operands
None.
Description
Group Services provides distributed coordination and synchronization services for other distributed subsystems running on a set of nodes on the IBM RS/6000 SP. The hagsctrl control script controls the operation of the subsystems that are required for Group Services. These subsystems are under the control of the System Resource Controller (SRC) and belong to a subsystem group called hags. Associated with each subsystem is a daemon.
An instance of the Group Services subsystem executes on the control workstation and on every node of a system partition. Because Group Services provides its services within the scope of a system partition, its subsystems are said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It can be issued from either the control workstation or any of the system partition's nodes.
From an operational point of view, the Group Services subsystem group is organized as follows:
The hags subsystem is associated with the hagsd daemon. The hagsglsm subsystem is associated with the hagsglsmd daemon.
The subsystem names on the nodes are hags and hagsglsm. There is one of each subsystem per node and it is associated with the system partition to which the node belongs.
On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named hags.sp_prod, hags.sp_test, hagsglsm.sp_prod, and hagsglsm.sp_test.
The hagsd daemon provides the majority of the Group Services functions.
The hagsglsmd daemon provides global synchronization services for the switch adapter membership group.
The hagsctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.
The hagsctrl script provides a variety of controls for operating the Group Services subsystems:
Before performing any of these functions, the script obtains the current system partition name (using the spget_syspar command) and the node number (using the node_number) command. If the node number is zero, the control script is running on the control workstation.
Except for the clean and unconfigure functions, all functions are performed within the scope of the current system partition.
Adding the Subsystem
When the -a flag is specified, the control script uses the mkssys command to add the Group Services subsystems to the SRC. The control script operates as follows:
The service name that is entered in the /etc/services file is hags.syspar_name.
Starting the Subsystem
When the -s flag is specified, the control script uses the startsrc command to start the Group Services subsystems, hags and hagsglsm.
Stopping the Subsystem
When the -k flag is specified, the control script uses the stopsrc command to stop the Group Services subsystems, hags and hagsglsm.
Deleting the Subsystem
When the -d flag is specified, the control script uses the rmssys command to remove the Group Services subsystems from the SRC. The control script operates as follows:
Cleaning Up the Subsystems
When the -c flag is specified, the control script stops and removes the Group Services subsystems for all system partitions from the SRC. The control script operates as follows:
Unconfiguring the Subsystems
When the -u flag is specified, the control script performs the function of the -c flag in all system partitions and then removes all port numbers from the SDR allocated by the Group Services subsystems.
| Note: | The -u flag is effective only on the control workstation. |
Prior to executing the hagsctrl command with the -u flag on the control workstation, the hagsctrl command with the -c flag must be executed from all of the nodes. If this subsystem is not successfully cleaned from all of the nodes, different port numbers may be used by this subsystem, leading to undefined behavior.
Turning Tracing On
When the -t flag is specified, the control script turns tracing on for the hagsd daemon, using the traceson command. Tracing is not available for the hagsglsmd daemon.
Turning Tracing Off
When the -o flag is specified, the control script turns tracing off (returns it to its default level) for the hagsd daemon, using the tracesoff command. Tracing is not available for the hagsglsmd daemon.
Refreshing the Subsystem
The -r flag has no effect for this subsystem.
Logging
While they are running, the Group Services daemons provide information about their operation and errors by writing entries in a log file in the /var/ha/log directory.
Each daemon limits the log size to a pre-established number of lines (by default, 5,000 lines). When the limit is reached, the daemon appends the string .bak to the name of the current log file and begins a new log. If a .bak version already exists, it is removed before the current log is renamed.
Files
The file names include the following variables:
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must be running with an effective user ID of root.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
"The Group Services Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide
IBM Parallel System Support Programs for AIX: Group Services Programming Guide and Reference
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/hagsctrl
Related Information
Commands: hagsd, hagsglsmd, lssrc, startsrc, stopsrc, syspar_ctrl
Examples
hagsctrl -a
hagsctrl -s
hagsctrl -k
hagsctrl -d
hagsctrl -c
hagsctrl -u
hagsctrl -t
hagsctrl -o
lssrc -g hags
lssrc -s subsystem_name
lssrc -l -s subsystem_name
In response, the system returns information that includes the running status of the subsystem, the number and identity of connected GS clients, information about the Group Services domain, and the number of providers and subscribers in established groups.
lssrc -a
Purpose
hagsd - A Group Services daemon that provides a general purpose facility for coordinating and monitoring changes to the state of an application that is running on a set of nodes.
Syntax
hagsd daemon_name
Flags
None.
Operands
Description
The hagsd daemon is part of the Group Services subsystem, which provides a general purpose facility for coordinating and monitoring changes to the state of an application that is running on a set of nodes. This daemon provides most of the services of the subsystem.
One instance of the hagsd daemon executes on the control workstation for each system partition. An instance of the hagsd daemon also executes on every node of a system partition. The hagsd daemon is under System Resource Controller (SRC) control.
Because the daemon is under SRC control, it is better not to start it directly from the command line. It is normally called by the hagsctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system. If you must start or stop the daemon directly, use the startsrc or stopsrc command.
For more information about the Group Services daemons, see the hagsctrl man page.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
"The Group Services Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide
IBM Parallel System Support Programs for AIX: Group Services Programming Guide and Reference
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/hagsd
Related Information
Commands: hagsctrl, hagsglsmd
Examples
See the hagsctrl command.
Purpose
hagsglsmd - A Group Services daemon that provides global synchronization services for the switch adapter membership group.
Syntax
hagsglsmd daemon_name
Flags
None.
Operands
Description
The hagsglsmd daemon is part of the Group Services subsystem, which provides a general purpose facility for coordinating and monitoring changes to the state of an application that is running on a set of nodes. This daemon provides global synchronization services for the High Performance Switch adapter membership group.
One instance of the hagsglsmd daemon executes on the control workstation for each system partition. An instance of the hagsglsmd daemon also executes on every node of a system partition. The hagsglsmd daemon is under System Resource Controller (SRC) control.
Because the daemon is under SRC control, it is better not to start it directly from the command line. It is normally called by the hagsctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system. If you must start or stop the daemon directly, use the startsrc or stopsrc command.
For more information about the Group Services daemons, see the hagsctrl man page.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
"The Group Services Subsystem" chapter of IBM Parallel System Support Programs for AIX: Group Services Programming Guide and Reference
IBM Parallel System Support Programs for AIX: Group Services Programming Guide and Reference
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/hagsglsmd
Related Information
Commands: hagsctrl, hagsd
Examples
See the hagsctrl command.
Purpose
hardmon - Monitors and controls the state of the SP hardware.
Syntax
hardmon [-B] [-r poll_rate] [-d debug_flag] ...
Flags
Operands
None.
Description
hardmon is the Hardware Monitor daemon. The daemon monitors and controls the state of the SP hardware contained in one or more SP frames. This command is not normally executed from the command line. Access to the Hardware Monitor is provided by the hmmon, hmcmds, spmon, s1term, and nodecond commands. Control of the Hardware Monitor daemon is provided by the hmadm command. These commands are the Hardware Monitor "client" commands.
The Hardware Monitor daemon executes on the Monitor and Control Node (MACN). The MACN is that IBM RS/6000 workstation to which the RS-232 lines are connected to the frames. The MACN is one and the same as the control workstation. The daemon is managed by the System Resource Controller (SRC). When the MACN is booted, an entry in /etc/inittab invokes the startsrc command to start the daemon. The daemon is configured in the SRC to be restarted automatically if it terminates for any reason other than the stopsrc command. The SRC subsystem name for the Hardware Monitor daemon is hardmon.
hardmon obtains configuration information from the System Data Repository (SDR). The SP_ports object class specifies the port number that the daemon is to use to accept TCP/IP connections from the client commands. The port number is obtained from the object whose daemon attribute value matches hardmon and whose host_name attribute value matches the host name of the workstation on which the daemon is executing. There must be one hardmon object in SP_ports for the MACN. The Frame object class contains an object for each frame in the SP system.
The attributes of interest to the daemon are frame_number, tty, and MACN. When started, the daemon fetches all those objects in the Frame class whose MACN attribute value matches the host name of the workstation on which the daemon is executing. For each frame discovered in this manner, the daemon saves the frame number and opens the corresponding tty device. When all frames have been configured, the daemon begins to poll the frames for state information. Current state and changed state can then be obtained using the hmmon and spmon commands. The hmcmds and spmon commands can be used to control the hardware within the frames.
The daemon also reads the file /spdata/sys1/spmon/hmthresholds for values used to check boundary conditions for certain state variables. This file should only be changed on request from IBM support. Finally, the /spdata/sys1/spmon/hmacls file is read for Access Control List (ACL) information. Refer to the hmadm command and the /spdata/sys1/spmon/hmacls file for more information on ACLs.
All errors detected by the Hardware Monitor daemon are written to the AIX error log.
The flags in the SRC subsystem object for the hardmon subsystem should not normally be changed. For example, if the poll rate is more than 5 seconds, the nodecond command can fail with unpredictable results. Upon request from IBM support for more information to aid in problem determination, debug flags can be set using the hmadm command.
If the High Availability Control Workstation (HACWS) Frame Supervisor (type 20) or the SEPBU HACWS Frame Supervisor (type 22) is installed in the SP frames, the -B flag is used to run the Hardware Monitor daemon in diagnostic mode. This diagnostic mode is used to validate that the frame ID written into the Supervisor matches the frame ID configured in the SDR for that frame. Normally, the frame ID is automatically written into the Supervisor during system installation. The frame ID is written into the frame to detect cabling problems in an HACWS configuration. In a non-HACWS SP configuration, the -B flag is useful whenever the RS232 cables between the frames and MACN are changed (but only if one or more frames contain a type 20 or type 22 supervisor). The hardmon command can be executed directly from the command line with the -B flag, but only after the currently running daemon is stopped using the stopsrc command. Diagnostic messages are written to the AIX error log. The daemon exits when all frames are validated.
Frame ID validation is also performed every time the daemon is started by the System Resource Controller. Any frame that has a frame ID mismatch can be monitored, but any control commands to the frame are ignored until the condition is corrected. A frame with a mismatch is noted in the System Monitor Graphical User Interface as well as in the AIX error log. The hmcmds command can be used to set the currently configured frame ID into a type 20 or type 22 supervisor after it is verified that the frame is correctly connected to the MACN.
Additional Configuration Information: The Hardware Monitor subsystem also obtains information from the system partition and the Syspar_map object classes in the SDR. While this information is not used by the hardmon daemon itself, it is used by the hardmon client commands listed under Related Information. Each of these commands executes in the environment of one system partition. If the SP system is not partitioned, these commands execute in the environment of the entire system. In any case, the Syspar_map object class is used to determine which nodes are contained in the current environment. The attributes of interest are syspar_name and node_number.
Starting and Stopping the hardmon Daemon
The hardmon daemon is under System Resource Controller (SRC) control. It uses the signal method of communication in SRC. The hardmon daemon is a single subsystem and not associated with any SRC group. The subsystem name is hardmon. In order to start the hardmon daemon, use the startsrc -s hardmon command. This starts the daemon with the default arguments and SRC options. The hardmon daemon is setup to be respawnable and be the only instance of the hardmon daemon running on a particular node or control workstation. Do not start the hardmon daemon from the command line without using the startsrc command to start it.
To stop the hardmon daemon, use the stopsrc -s hardmon command. This stops the daemon and does not allow it to respawn.
To display the status of the hardmon daemon, use the lssrc -s hardmon command.
If the default startup arguments need to be changed, use the chssys command to change the startup arguments or the SRC options. Refer to AIX Version 4 Commands Reference and AIX Version 4 General Programming Concepts: Writing and Debugging Programs for more information about daemons under SRC control and how to modify daemon arguments when under SRC.
To view the current SRC options and daemon arguments, use the odmget -q 'subsysname=hardmon' SRCsubsys command.
Files
Related Information
Commands: hmadm, hmcmds, hmmon, nodecond, spmon, s1term
File: /spdata/sys1/spmon/hmacls
Examples
startsrc -s hardmon
stopsrc -s hardmon
lssrc -s hardmon
lssrc -a
odmget -q 'subsysname=hardmon' SRCsubsys
Purpose
hats - Starts or restarts Topology Services on a node or on the control workstation.
Syntax
hats
Flags
None.
Operands
None.
Description
Use this command to start the operation of Topology Services for a system partition (the hatsd daemon) on the control workstation or on a node within a system partition.
The hats script is not normally executed from the command line. It is normally called by the hatsctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.
The Topology Services subsystem provides internal services to PSSP components.
Note that the hats script issues the no -o nonlocsrcroute=1 command, which enables IP source routing. Do not change this setting, because the Topology Services subsystem requires this setting to work properly. If you change the setting, the Topology Services subsystem and a number of other subsystems that depend on it will no longer operate properly.
The hatsd daemon is initially started on the control workstation with the System Resource Controller (SRC), regardless of the level of the system partition. It is respawned automatically if the hatsd daemon fails. The SP_NAME environment variable causes selection of the correct topology configuration.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
The "Starting Up and Shutting Down the SP System" chapter and "The System Data Repository" appendix in IBM Parallel System Support Programs for AIX: Administration Guide
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/hats
Related Information
Commands: hatsctrl, lssrc, startsrc, stopsrc, syspar_ctrl
Examples
See the hatsctrl command.
Purpose
hatsctrl - A control script that starts the Topology Services subsystem.
Syntax
hatsctrl {-a | -s | -k | -d | -c | -u | -t | -o | -r | -h}
Flags
Operands
None.
Description
Topology Services is a distributed subsystem of PSSP that provides information to other PSSP subsystems about the state of the nodes and adapters on the IBM RS/6000 SP.
The hatsctrl control script controls the operation of the Topology Services subsystem. The subsystem is under the control of the System Resource Controller (SRC) and belongs to a subsystem group called hats. Associated with each subsystem is a daemon and a script that configures and starts the daemon.
An instance of the Topology Services subsystem executes on the control workstation and on every node of a system partition. Because Topology Services provides its services within the scope of a system partition, its subsystem is said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It can be issued from either the control workstation or any of the system partition's nodes.
From an operational point of view, the Topology Services subsystem group is organized as follows:
The hats subsystem is associated with the hatsd daemon and the hats script. The hats script configures and starts the hatsd daemon.
The subsystem name on the nodes is hats. There is one of each subsystem per node and it is associated with the system partition to which the node belongs.
On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named hats.sp_prod and hats.sp_test.
The hatsd daemon provides the Topology Services. The hats script configures and starts the hatsd daemon.
The hatsctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.
The hatsctrl script provides a variety of controls for operating the Topology Services subsystem:
Before performing any of these functions, the script obtains the current system partition name and IP address (using the spget_syspar command) and the node number (using the node_number) command. If the node number is zero, the control script is running on the control workstation.
Except for the clean and unconfigure functions, all functions are performed within the scope of the current system partition.
Adding the Subsystem
When the -a flag is specified, the control script uses the mkssys command to add the Topology Services subsystem to the SRC. The control script operates as follows:
The service name that is entered in the /etc/services file is hats.syspar_name.
Starting the Subsystem
When the -s flag is specified, the control script uses the startsrc command to start the Topology Services subsystem, hats.
Stopping the Subsystem
When the -k flag is specified, the control script uses the stopsrc command to stop the Topology Services subsystem, hats.
Deleting the Subsystem
When the -d flag is specified, the control script uses the rmssys command to remove the Topology Services subsystem from the SRC. The control script operates as follows:
Cleaning Up the Subsystems
When the -c flag is specified, the control script stops and removes the Topology Services subsystems for all system partitions from the SRC. The control script operates as follows:
Unconfiguring the Subsystems
When the -u flag is specified, the control script performs the function of the -c flag in all system partitions and then removes all port numbers from the SDR allocated by the Topology Services subsystems.
| Note: | The -u flag is effective only on the control workstation. |
Prior to executing the hatsctrl command with the -u flag on the control workstation, the hatsctrl command with the -c flag must be executed from all of the nodes. If this subsystem is not successfully cleaned from all of the nodes, different port numbers may be used by this subsystem, leading to undefined behavior.
Turning Tracing On
When the -t flag is specified, the control script turns tracing on for the hatsd daemon, using the traceson command.
Turning Tracing Off
When the -o flag is specified, the control script turns tracing off (returns it to its default level) for the hatsd daemon, using the tracesoff command.
Refreshing the Subsystem
When the -r flag is specified, the control script refreshes the subsystem, using the hats refresh command and the refresh command.
It rebuilds the information about the node and adapter configuration in the SDR and signals the daemon to read the rebuilt information.
Logging
While it is running, the Topology Services daemon provides information about its operation and errors by writing entries in a log file. The hatsd daemon in the system partition named syspar_name uses a log file called /var/ha/log/hats.syspar_name.
Files
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must be running with an effective user ID of root.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/hatsctrl
Related Information
Commands: hats, lssrc, startsrc, stopsrc, syspar_ctrl
Examples
hatsctrl -a
hatsctrl -s
hatsctrl -k
hatsctrl -d
hatsctrl -c
hatsctrl -u
hatsctrl -t
hatsctrl -o
lssrc -g hats
lssrc -s subsystem_name
lssrc -l -s subsystem_name
In response, the system returns information that includes the running status of the subsystem, the number of defined and active nodes, the required number of active nodes for a quorum, the status of the group of nodes, and the IP addresses of the source node, the group leader, and the control workstation.
lssrc -a
Purpose
hb - Starts or restarts heartbeat services on a node or on the control workstation.
Syntax
hb [-spname syspar_name] [-splevel pssp_level]
{ [start | resume] | [stop | quiesce] | reset |
[query | qall | qsrc] | refresh | mksrc optional_flags | rmsrc restore |
[debug | debug off] | [trace on | trace off] }
Flags
Operands
Description
Use this command to control the operation of heartbeat services for a system partition (the hbd daemon) on the control workstation or on a node within a system partition.
The hb script is not normally executed from the command line. It is normally called by the hbctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.
The heartbeat server provides input to the host_responds function within a system partition for the System Monitor through the System Monitor hr daemons. It also provides input to the IBM Recoverable Virtual Shared Disk daemons, if that product is installed on the nodes. This involves the following daemons:
| Note: | The hrd daemon is controlled by the hr script. The hbd daemon is controlled with this script. |
The hbd daemon is initially started on the control workstation with the System Resource Controller (SRC), regardless of the level of the system partition. It is respawned automatically if the hbd daemon fails. The SP_NAME environment variable causes selection of the correct heartbeat daemon.
The hbd daemons communicate with their counterparts on other nodes over the SP Ethernet. The udp heartbeat entry in /etc/services on all nodes must specify the same port number.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
The "Starting Up and Shutting Down the SP System" chapter and "The System Data Repository" appendix in IBM Parallel System Support Programs for AIX: Administration Guide
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/hb
Related Information
Commands: hbctrl, lssrc, startsrc, stopsrc, syspar_ctrl
Examples
See the hbctrl command.
Purpose
hbctrl - A control script that starts the Heartbeat subsystem.
Syntax
hbctrl { -a | -s | -k | -d | -c | -t | -o | -r | -h }
Flags
Operands
None.
Description
The Heartbeat subsystem communicates with several PSSP subsystems as part of providing information about the state of the nodes and adapters on the IBM RS/6000 SP.
The hbctrl control script controls the operation of the Heartbeat subsystem. The subsystem is under the control of the System Resource Controller (SRC) and belongs to a subsystem group called hb. Associated with each subsystem is a daemon and a script that configures and starts the daemon.
An instance of the Heartbeat subsystem executes on the control workstation and on every node of a system partition. Because Heartbeat provides its services within the scope of a system partition, its subsystem is said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It can be issued from either the control workstation or any of the system partition's nodes.
From an operational point of view, the Heartbeat subsystem group is organized as follows:
The hb subsystem is associated with the hbd daemon and the hb script. The hb script configures and starts the hbd daemon.
The subsystem name on the nodes is hb. There is one of each subsystem per node and it is associated with the system partition to which the node belongs.
On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named hb.sp_prod and hb.sp_test.
The hbd daemon provides the Heartbeat services. The hb script configures and starts the hbd daemon.
The hbctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.
The hbctrl script provides a variety of controls for operating the Heartbeat subsystem:
Before performing any of these functions, the script obtains the current system partition name and IP address (using the spget_syspar command) and the node number (using the node_number) command. If the node number is zero, the control script is running on the control workstation.
Except for the clean function, all functions are performed within the scope of the current system partition.
Adding the Subsystem
When the -a flag is specified, the control script uses the mkssys command to add the Heartbeat subsystem to the SRC. The control script operates as follows:
The service name that is entered in the /etc/services file is heartbeat.
Starting the Subsystem
When the -s flag is specified, the control script uses the hb command to start the Heartbeat subsystem, hb.
Stopping the Subsystem
When the -k flag is specified, the control script uses the hb command to stop the Heartbeat subsystem, hb.
Deleting the Subsystem
When the -d flag is specified, the control script uses the rmssys command to remove the Heartbeat subsystem from the SRC. The control script operates as follows:
Cleaning Up the Subsystems
When the -c flag is specified, the control script stops and removes the Heartbeat subsystems for all system partitions from the SRC. The control script operates as follows:
Turning Tracing On
When the -t flag is specified, the control script turns tracing on for the hbd daemon, using the traceson command.
Turning Tracing Off
When the -o flag is specified, the control script turns tracing off (returns it to its default level) for the hbd daemon, using the tracesoff command.
Refreshing the Subsystem
When the -r flag is specified, the control script refreshes the subsystem, using the hb refresh command.
Logging
While it is running, the Heartbeat daemon provides information about its operation and errors by writing entries in a log file. The hbd daemon in the system partition named syspar_name uses a log file called /var/ha/log/hb.syspar_name.
Files
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must be running with an effective user ID of root.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/hbctrl
Related Information
Commands: hb, lssrc, startsrc, stopsrc, syspar_ctrl
Examples
hbctrl -a
hbctrl -s
hbctrl -k
hbctrl -d
hbctrl -c
hbctrl -t
hbctrl -o
lssrc -g hb
lssrc -s subsystem_name
lssrc -l -s subsystem_name
In response, the system returns information that includes the running status of the subsystem, the number of defined and active nodes, the required number of active nodes for a quorum, the status of the group of nodes, the frequency and sensitivity values in use for the subsystem, and the IP addresses of the source node, the group leader, and the control workstation.
lssrc -a
Purpose
hc.vsd - Queries and controls the hc subsystem of IBM Recoverable Virtual Shared Disk.
Syntax
Flags
None.
Operands
The hc subsystem must be restarted for this operand to take effect.
Once debugging is turned on and the hc subsystem has been restarted, hc.vsd trace should be issued to turn on tracing.
Use this operand under the direction of your IBM service representative.
Note: the default when the node is booted is to have stdout and stderr routed to the console. If debugging is turned off stdout and stderr will be routed to /dev/null and all further trace messages will be lost. You can determine if debug has been turned on by issuing hc.vsd qsrc. If debug has been turned on the return value will be:
action = "2"
This operand is only meaningful after the debug operand has been used to send stdout and stderr to the console and the hc subsystem has been restarted.
Description
Use this command to display information about the hc subsystem and to change the status of the subsystem.
You can restart the hc subsystem with the VSD Perspective. Type spvsd and select actions for IBM VSD nodes.
Exit Values
| Note: | The query and qsrc subcommands have no exit values. |
Security
You must have root privilege to issue the debug, mksrc, reset, start, and stop commands.
Implementation Specifics
This command is part of the IBM Recoverable Virtual Shared Disk Licensed Program Product (LPP).
Prerequisite Information
See "Using the IBM Recoverable Virtual Shared Disk Software" in IBM Parallel System Support Programs for AIX: Managing Shared Disks
Location
/usr/lpp/csd/bin/hc.vsd
Related Information
Commands: ha_vsd, ha.vsd
Examples
To stop the hc subsystem and restart it, enter:
hc.vsd reset
The system returns the messages:
Waiting for the hc subsystem to exit. hc subsystem exited successfully. Starting hc subsystem. hc subsystem started PID=xxx.
Purpose
hmadm - Administers the Hardware Monitor daemon.
Syntax
hmadm [ {-d debug_flag} ... ] operation
Flags
Operands
The operation must be one of the following:
This operation must by invoked by the administrator after the administrator modifies the ACL configuration file.
Description
The hmadm command is used to administer the Hardware Monitor daemon. The Hardware Monitor daemon executes on the control workstation and is used to monitor and control the SP hardware. Five administrative actions are supported, as specified by the operation operand.
Normally when the daemon exits, it is automatically restarted by the system. If frame configuration information is changed, the quit operation can be used to update the system.
The daemon writes debug information and certain error information to its log file. The log file is located in /var/adm/SPlogs/spmon and its name is of the form hmlogfile.nnn, where nnn is the Julian date of the day the log file was opened by the daemon. The clog operation causes the daemon to close its current log file and create a new one using the name hmlogfilennn, where nnn is the current Julian date. If this name already exists, a name of the form hmlogfile.nnn_m is used, where m is a number picked to create a unique file name.
There are 15 debug flags supported by the daemon:
This command uses the SP Hardware Monitor. Therefore, the user must be authorized to access the Hardware Monitor subsystem and must have administrative permission. Since the Hardware Monitor subsystem uses SP authentication services, the user must execute the kinit command prior to executing this command. Alternatively, site-specific procedures can be used to obtain the tokens that are otherwise obtained by kinit.
Files
Related Information
File: /spdata/sys1/spmon/hmacls
Purpose
hmcmds - Controls the state of the SP hardware.
Syntax
Flags
Operands
Description
Use this command to control the state of the SP hardware. Control is provided via the Virtual Front Operator Panel (VFOP). VFOP is a set of commands that can be sent to the hardware components contained in one or more SP frames. Each frame consists of 18 slots, numbered 0 through 17, where slot 0 represents the frame itself, slot 17 can contain a switch and slots 1 through 16 can contain thin or wide processing nodes. Wide nodes occupy two slots and are addressed by the odd slot number. In a switch only frame, slots 1 through 16 can contain switches; the switches occupy two slots and are addressed by the even slot number.
Normally, commands are only sent to the hardware components in the current system partition. A system partition only contains processing nodes. The switches and the frames themselves are not contained in any system partition. To send VFOP commands to hardware components not in the current system partition or to any frame or switch, use the -G flag.
The following list describes the VFOP command set. Commands that require the -G flag are marked by an asterisk (*). Commands marked by a double asterisk (**) are primarily used by the Eclock command and are not intended for general use since an in-depth knowledge of switch clock topology is required to execute these commands in the proper sequence.
Before issuing these commands, refer to the "Using a Switch" chapter in the IBM Parallel System Support Programs for AIX: Administration Guide for detailed descriptions.
High Performance Switch
SP Switch
Any Frame, Node, or Switch that Supports Microcode Download
| Note: | You must issue this command before issuing the microcode command. |
| Note: | You must issue the basecode command before issuing this command. |
Refer to the s1term command for information on making serial connections.
Any Node
Any Frame
Any Frame, Node, or Switch
Any Node or Switch
One of these commands must be specified using the command operand. The command is sent to the hardware specified by the slot_spec operands. However, the command is not sent to any hardware that is not in the current system partition unless the -G flag is specified. If the -G flag is not specified and the slot_spec operands specify no hardware in the current system partition, an error message is displayed.
The slot_spec operands are interpreted as slot ID specifications. A slot ID specification names one or more slots in one or more SP frames and it has either of two forms:
fidlist:sidlist or nodlist
where:
The first form specifies frame numbers and slot numbers. The second form specifies node numbers. A fval is a frame number or a range of frame numbers of the form a-b. A sval is a slot number from the set 0 through 17 or a range of slot numbers of the form a-b. A nval is a node number or a range of node numbers of the form a-b.
The relationship of node numbers to frame and slot numbers is shown in the following formula:
node_number = ((frame_number - 1) × 16) + slot_number
| Note: | Node numbers can only be used to specify slots 1 through 16 of any frame. |
The following are some examples of slot ID specifications.
To specify slot 1 in frames 1 through 10, enter:
1-10:1
To specify frames 2, 4, 5, 6, and 7, enter:
2,4-7:0
To specify slots 9 through 16 in frame 5, enter:
5:9-16
If frame 5 contained wide nodes, the even slot numbers are ignored.
To specify specifies slots 1, 12, 13, 14, 15, and 16 in each of frames 3 and 4, enter:
3,4:1,12-16
To specify slot 17 in frame 4, enter:
4:17
To specify the nodes in slots 1 through 16 of frame 2, enter:
17-32
To specify the nodes in slot 1 of frame 1, slot 1 of frame 2 and slot 1 of frame 3, enter:
1,17,33
To specify the node in slot 6 of frame 1, enter:
6
Optionally, slot ID specifications can be provided in a file rather than as command operands. The file must contain one specification per line. The command requires that slot ID specifications be provided. If the command is to be sent to all SP hardware, the keyword all must be provided in lieu of the slot_spec operands. However, the all keyword can only be specified if the -G flag is specified and if the VFOP command is on or off, since on or off are the only commands common to all hardware components.
Commands sent to hardware for which they are not appropriate, or sent to hardware which does not exist, are silently ignored by the Hardware Monitor subsystem.
By default, and except for the reset, flash, and run_post commands, the hmcmds command does not terminate until the state of the hardware to which the command was sent matches the command or until 15 seconds have elapsed. If 15 seconds have elapsed, the hmcmds command terminates with a message stating the number of nodes whose state was expected to match the VFOP command sent and the number of nodes which actually are in that state. The state of hardware for which the VFOP command is inappropriate, or where the hardware does not exist, is ignored.
To execute the hmcmds command, the user must be authorized to access the Hardware Monitor subsystem and, for those frames specified to the command, the user must be granted VFOP permission. Commands sent to frames for which the user does not have VFOP permission are ignored. Since the Hardware Monitor subsystem uses SP authentication services, the user must execute the kinit command prior to executing this command. Alternatively, site-specific procedures can be used to obtain the tokens that are otherwise obtained by kinit.
Files
Related Information
Command: hmmon, spsvrmgr
Examples
hmcmds -G off all
hmcmds secure 1-5:1-16
hmcmds -G extclk3 1-8:17
hmcmds normal 6 2,3:2
Purpose
hmmon - Monitors the state of the SP hardware.
Syntax
Flags
Operands
Description
Use this command to monitor the state of the SP hardware contained in one or more SP frames. Each frame consists of 18 slots, numbered 0 through 17, where slot 0 represents the frame itself, slot 17 can contain a switch and slots 1 through 16 can contain thin or wide processing nodes. Wide nodes occupy two slots and are addressed by the odd slot number. In a switch only frame, slots 1 through 16 can contain switches; the switches occupy two slots and are addressed by the even slot number.
With no flags and operands, the command prints to standard output descriptive text of all hardware state changes in the current system partition as they occur, from the time the command is invoked. The command does not terminate, unless the -Q flag or the -V flag is specified, and must be interrupted by the user. To monitor all of the hardware in the SP system, the -G flag must be specified. Note that the switches and the frames themselves are not contained in any system partition.
When one or more slot_spec operands are present, each operand is interpreted as a slot ID specification. A slot ID specification names one or more slots in one or more SP frames and it has either of two forms:
fidlist:[sidlist] or nodlist
where:
The first form specifies frame numbers and slot numbers. The second form specifies node numbers. A fval is a frame number or a range of frame numbers of the form a-b. A sval is a slot number from the set 0 through 17 or a range of slot numbers of the form a-b. An nval is a node number or a range of node numbers of the form a-b. If a sidlist is not specified, all hardware in the frames specified by the fidlist is monitored.
The relationship of node numbers to frame and slot numbers is given by the following formula:
node_number = ((frame_number - 1) × 16) + slot_number
| Note: | The node numbers can only be used to specify slots 1 through 16 of any frame. |
The following are some examples of slot ID specifications.
To specify all hardware in frames 1 through 10, enter:
1-10:
To specify frames 2, 4, 5, 6, and 7, enter:
2,4-7:0
To specify slots 9 through 16 in frame 5, enter:
5:9-16
If frame 5 contained wide nodes, the even slot numbers are ignored.
To specify slots 1, 12, 13, 14, 15, and 16 in each of frames 3 and 4, enter:
3,4:1,12-16
To specify slot 17 in frame 4, enter:
4:17
To specify the nodes in slots 1 through 16 of frame 2, enter:
17-32
To specify the nodes in slot 1 of frame 1, slot 1 of frame 2 and slot 1 of frame 3, enter:
1,17,33
To specify the node in slot 6 of frame 1, enter:
6
Optionally, slot ID specifications may be provided in a file rather than as command operands. The file must contain one specification per line. When slot ID specifications are provided to the command, only the hardware named by the specifications is monitored. Furthermore, of the hardware named by these specifications, only that which is located in the current system partition is monitored. To monitor hardware not contained in the current system partition, the -G flag must be specified. If the -G flag is not specified and the slot ID specifications name no hardware in the current system partition, an error message is displayed.
The default output displays hardware state information on a slot-by-slot basis. The state information for each slot is captioned by its frame ID and slot ID and consists of two columns. Each column contains state variable information, one variable per line. Each variable is displayed as descriptive text and a value. Boolean values are displayed as TRUE or FALSE. Integer values are displayed in hexadecimal.
The command provides two other output formats, raw and symbolic. Both write the information for one state variable per line. The raw format consists of four fields separated by white space as follows:
The symbolic format consists of six fields separated by white space as follows:
The alternative output formats are suitable for input to post-processing programs, such as awk or scripts.
Output in any format can be limited to display only information from the specified hardware that corresponds to a list of state variables supplied to the command with the -v flag.
To execute the hmmon command, the user must be authorized to access the Hardware Monitor subsystem and, for those frames specified to the command, the user must be granted "Monitor" permission. State information is not returned for frames for which the user does not have "Monitor" permission. Since the Hardware Monitor subsystem uses SP authentication services, the user must execute the kinit command prior to executing this command. Alternatively, site specific procedures may be used to obtain the tokens that are otherwise obtained by kinit.
The user can monitor nonexistent nodes in an existing frame in order to detect when a node is added while the system is up and running. No information is returned for nonexistent nodes when the -q or -Q flag is specified.
Files
Related Information
Command: hmcmds
Examples
The following is an example of default output from hmmon -G -Q 1:0,1. The command returns similar output, depending on your system configuration.
frame 001, slot 00: node 01 I2C not responding FALSE node 02 I2C not responding TRUE node 03 I2C not responding FALSE node 04 I2C not responding TRUE switch I2C not responding FALSE node 01 serial link open TRUE node 02 serial link open FALSE node 03 serial link open TRUE frame LED 1 (green) 0x0001 frame LED 2 (green) 0x0001 frame LED 3 (yellow) 0x0000 frame LED 4 (yellow) 0x0000 AC-DC section A power off FALSE AC-DC section B power off FALSE AC-DC section C power off FALSE AC-DC section D power off FALSE supervisor timer ticks 0x88f2 +48 voltage 0x0078 temperature 0x0036 supervisor serial number 0x1234 supervisor type 0x0011 supervisor code version 0x5ff5 frame 001, slot 01: serial 1 DTR asserted TRUE -12 volt low warning TRUE -12 volt low shutdown FALSE -12 volt high warning TRUE +4 volt low shutdown FALSE +4 volt high warning TRUE fan 1 shutdown FALSE fan 2 warning TRUE DC-DC power on > 10 secs TRUE +5 DC-DC output good TRUE 7 segment display flashing FALSE node/switch LED 1 (green) 0x0001 reset button depressed FALSE serial link open TRUE diagnosis return code 0x00dd 7 segment LED A 0x00ff +5 I/O voltage 0x007f +12 voltage 0x0096
The following is an example of raw output from hmmon -G -Q -r 1:0. The command returns similar output, depending on your system configuration.
1 0 0x880f 32 1 0 0x881c 0 1 0 0x881d 4 1 0 0x8834 54 1 0 0x8839 4660 1 0 0x883a 17 1 0 0x88a8 1 1 1 0x9097 16 1 1 0x9098 0 1 1 0x9047 1 1 1 0x909d 128 1 1 0x9023 221 1 1 0x90a1 255 1 1 0x90a2 127 1 1 0x903b 24565
The following is an example of symbolic output from hmmon -G -Q -s 1:0. The command returns similar output, depending on your system configuration.
1 0 nodefail1 FALSE 0x8802 node 01 I2C not responding 1 0 nodeLinkOpen1 TRUE 0x8813 node 01 serial link open 1 0 frACLED 1 0x8824 frame LED 1 (green) 1 0 frNodeComm 0 0x8827 frame LED 4 (yellow) 1 0 frPowerOff_B FALSE 0x882d AC-DC section B power off 1 0 timeTicks 34881 0x8830 supervisor timer ticks 1 0 voltP48 46.800 0x8831 +48 voltage 1 0 type 17 0x883a supervisor type 1 0 codeVersion 24565 0x883b supervisor code version 1 0 controllerResponds TRUE 0x88a8 Frame responding to polls 1 0 rs232DCD TRUE 0x88a9 RS232 link DCD active 1 0 rs232CTS TRUE 0x88aa RS232 link CTS active 1 1 fanfail2 FALSE 0x9050 fan 2 shutdown 1 1 nodePowerOn10Sec TRUE 0x904b DC-DC power on > 10 secs 1 1 P5DCok TRUE 0x9097 +5 DC-DC output good 1 1 powerLED 1 0x9047 node/switch LED 1 (green) 1 1 envLED 0 0x9048 node/switch LED 2 (yellow) 1 1 keyModeSwitch 0 0x909b key switch 1 1 serialLinkOpen TRUE 0x909d serial link open 1 1 LED7SegA 255 0x909f 7 segment LED A 1 1 voltP5i 4.978 0x90a2 +5 I/O voltage
The raw and symbolic formats output by the hmmon command contain the variable ID of each state variable. Refer to Appendix D in IBM Parallel System Support Programs for AIX: Administration Guide.
Purpose
hmreinit - Stops and starts the Hardware Monitor daemon and modifies the System Data Repository (SDR) as necessary.
Syntax
hmreinit
Flags
None.
Operands
None.
Description
Use this command to reinitialize the Hardware Monitor daemon when changes to the SP system occur. Specifically, hmreinit determines if there are any changes in the switch configuration (such as, adding, deleting, or upgrading switches). The hmreinit command then calls SDR_config -u to update the switch information in the SDR and generates switch node numbers based on this change. If a switch configuration change is detected, hmreinit will test to see if a single system partition exists. If only one system partition exists, hmreinit will delete the Syspar_map entries from the SDR and then calls create_par_map to generate the correct objects. If more than one system partition exists, hmreinit will issue a message to that effect and exits.
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must have root privilege to run this command and have a valid ticket.
Implementation Specifics
This command is part of the Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Location
/usr/lpp/ssp/install/bin/hmreinit
Related Information
Commands: spframe, SDR_config
For additional information, refer to the "Reconfiguring the IBM RS/6000 SP System" chapter in IBM Parallel System Support Programs for AIX: Installation and Migration Guide.
Purpose
hostlist - Lists SP host names to standard output based on criteria.
Syntax
Flags
Operands
None.
Description
The hostlist command writes SP host names to standard output. The arguments to the command indicate the host names to be written. More than one flag can be specified, in which case, the hosts indicated by all the flags are written.
If no arguments are specified, hostlist writes the contents of a file specified by the WCOLL environment variable. If the WCOLL environment variable does not exist, the MP_HOSTFILE environment variable is used as the name of a POE host file to use for input. Finally, ./host.list is tried. If none of these steps are successful, an error has occurred. The input file is in dsh-working-collective-file or POE-host-list-file format. Node pool specifications in POE host files are not supported.
Files
Related Information
Commands: dsh, sysctl
Examples
hostlist -av -e badhost > ./working
hostlist -s 1-4:1 | dsh -w - program
hostlist -n 1-16,33-35 -w otherone | dsh -w - program
export WCOLL=./wcoll hostlist | sysctl -c - sysctl_app args
Purpose
hr - Controls the host_responds monitor daemon, hrd, on the control workstation.
Syntax
hr [-spname syspar_name]
{ [start | resume] | [stop | quiesce] | reset |
[query | qall | qsrc] | refresh | mksrc optional_flags | rmsrc | clean | restore |
[debug | debug off ] | [trace on | trace off ] }
Flags
Operands
Description
Use this command to control the operation of hrd, the host_responds daemon on the control workstation within a system partition. The heartbeat server provides input to the host_responds function within a system partition for the System Monitor through the hrd daemons.
The hr script is not normally executed from the command line. It is normally called by the hrctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.
The hrd daemon is initially started on the control workstation with the System Resource Controller (SRC). It is respawned automatically if the hrd daemon fails. The SP_NAME environment variable causes selection of the correct daemon.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
The "Starting Up and Shutting Down the SP System" chapter and "The System Data Repository" appendix in IBM Parallel System Support Programs for AIX: Administration Guide
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/hr
Related Information
Commands: hb, hrctrl, lssrc, startsrc, stopsrc, syspar_ctrl
Examples
See the hrctrl command.
Purpose
hrctrl - A script that controls the Host_Responds subsystem.
Syntax
hrctrl { -a | -s | -k | -d | -c | -t | -o | -r | -h }
Flags
Operands
None.
Description
The Host_Responds subsystem provides to other PSSP subsystems information about the state of the nodes on the IBM RS/6000 SP.
The hrctrl control script controls the operation of the Host_Responds subsystem. The subsystem is under the control of the System Resource Controller (SRC) and belongs to a subsystem group called hr. Associated with each subsystem is a daemon and a script that configures and starts the daemon.
An instance of the Host_Responds subsystem executes on the control workstation for every system partition. Because Host_Responds provides its services within the scope of a system partition, its subsystem is said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. The script should be issued on the control workstation. If it is issued on a node, it has no effect.
From an operational point of view, the Host_Responds subsystem group is organized as follows:
The hr subsystem is associated with the hrd daemon and the hr script. The hr script configures and starts the hrd daemon.
On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named hr.sp_prod and hr.sp_test.
The subsystem does not run on the nodes.
The hrd daemon provides the Host_Responds services. The hr script configures and starts the hrd daemon.
The hrctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.
The hrctrl script provides a variety of controls for operating the Host_Responds subsystem:
Before performing any of these functions, the script obtains the node number (using the node_number) command. If the node number is not zero, the control script is running on a node and it exits immediately. Otherwise, it is executing on the control workstation and it calls the hr script with an operand that specifies the action to be performed.
Adding the Subsystem
When the -a flag is specified, the control script uses the hr command with the mksrc operand to add the Host_Responds subsystem to the SRC.
Starting the Subsystem
When the -s flag is specified, the control script uses the hr command with the start operand to start the Host_Responds subsystem, hr.
Stopping the Subsystem
When the -k flag is specified, the control script uses the hr command with the stop operand to stop the Host_Responds subsystem, hr.
Deleting the Subsystem
When the -d flag is specified, the control script uses the hr command with the rmsrc operand to remove the Host_Responds subsystem from the SRC.
Cleaning up the Subsystems
When the -c flag is specified, the control script uses the hr command with the clean operand to stop and remove the Host_Responds subsystems for all system partitions from the SRC.
Turning Tracing On
When the -t flag is specified, the control script turns tracing on for the hrd daemon, using the hr command with the trace on operand.
Turning Tracing Off
When the -o flag is specified, the control script turns tracing off (returns it to its default level) for the hrd daemon, using the hr command with the trace off operand.
Refreshing the Subsystem
When the -r flag is specified, the control script refreshes the subsystem, using the hr refresh command.
Standard Error
This command writes error messages (as necessary) to standard error.
Exit Values
Security
You must be running with an effective user ID of root.
Implementation Specifics
This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).
Prerequisite Information
AIX Version 4 Commands Reference
Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs
Location
/usr/lpp/ssp/bin/hrctrl
Related Information
Commands: hr, lssrc, startsrc, stopsrc, syspar_ctrl
Examples
hrctrl -a
hrctrl -s
hrctrl -k
hrctrl -d
hrctrl -c
hrctrl -t
hrctrl -o
lssrc -g hr
lssrc -s subsystem_name
lssrc -l -s subsystem_name
In response, the system returns information that includes the running status of the subsystem and the status of the nodes within the system partition.
lssrc -a
Purpose
hsdatalst - Displays data striping device (HSD) data for the IBM Virtual Shared Disks from the System Data Repository (SDR).
Syntax
hsdatalst [-G]
Flags
Operands
None.
Description
This command is used to display defined HSD information in the system.
You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:
smit list_vsdand select the List Defined Hashed Shared Disk option.
Files
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: defhsd, undefhsd
Examples
To display SDR HSD data, enter:
hsdatalst
Purpose
hsdvts - Verifies that a data striping device (HSD) for the IBM Virtual Shared Disks has been correctly configured and works.
Syntax
hsdvts hsd_name
Flags
None.
Operands
Description
| Attention |
|---|
|
Data on hsd_name will be overwritten and, therefore, destroyed. Use this command after you have defined your HSD, IBM Virtual Shared Disks, and logical volumes, but before you have loaded your application data onto any of them. |
This command writes /unix to hsd_name, reads it from hsd_name to a temporary file, and compares the temporary file to the original to make sure the I/O was successful. If the files compare exactly, the test was successful.
hsdvts writes to the raw hsd_name device /dev/rhsd_name. Since raw devices can only be written in multiples of 512-sized blocks, hsdvts determines the number of full 512-byte blocks in /unix file, and writes that number to hsd_name via dd command. It makes a copy of /unix that contains this number of 512-byte blocks for comparison to the copy read from hsd_name. The dd command is used for all copy operations.
Files
Prerequisite Information
IBM Parallel System Support Programs for AIX: Managing Shared Disks
Related Information
Commands: cfghsd, cfgvsd, dd, defhsd, startvsd
Purpose
/usr/lpp/ssp/css/ifconfig - Configures or displays network interface parameters for a network using TCP/IP.
Syntax
Flags
None.
Operands
Include a numeral after the abbreviation to identify the specific interface (for example, tr0).
The mask variable includes both the network part of the local address and the subnet part, which is taken from the host field of the address. The mask can be specified as a single hexadecimal number beginning with 0x, in standard Internet dotted decimal notation, or beginning with a name or alias that is listed in the /etc/networks file.
The mask contains 1's (ones) for the bit positions in the 32-bit address that are reserved for the network and subnet parts, and 0's (zeros) for the bit positions that specify the host. The mask should contain at least the standard network portion, and the subnet segment should be contiguous with the network segment.
| Note: | If the ARP is enable, offset is not used. |
Description
The ifconfig command has been modified to add support for the switch. This command is valid only on an SP system.
The ifconfig command can be used from the command line either to assign an address to a network interface, or to configure or display the current network interface configuration information. The ifconfig command must be used at system start up to define the network address of each interface present on a machine. It can also be used at a later time to redefine an interface's address or other operating parameters. The network interface configuration is held on the running system and must be reset at each system restart.
An interface can receive transmissions in differing protocols, each of which may require separate naming schemes. It is necessary to specify the address_family parameter, which can change the interpretation of the remaining parameters. The address families currently supported are inet and ns.
For the DARPA Internet family, inet, the address is either a host name present in the host name database, that is, the /etc/hosts file, or a DARPA Internet address expressed in the Internet standard dotted decimal notation.
For the Xerox Network Systems (XNS) family, ns, addresses are net:a.b.c.d.e.f., where net is the assigned network number (in decimal), and each of the six bytes of the host number, a through f, are specified in hexadecimal. The host number can be omitted on 10-Mbps Ethernet interfaces, which use the hardware physical address, and on interfaces other than the first interface.
While any user can query the status of a network interface, only a user who has administrative authority can modify the configuration of those interfaces.
Related Information
AIX Command: netstat
AIX Files: /etc/host, /etc/networks
Refer to IBM Parallel System Support Programs for AIX: Administration Guide for additional information on the SP Switch and the High Performance Switch.
Refer to AIX Version 4 System Management Guide: Communications and Networks for additional information on TCP/IP protocols.
Refer to AIX Version 4 General Programming Concepts: Writing and Debugging Programs for an overview on Xerox Network Systems (XNS).
Examples
The following are examples using the ifconfig command on a TCP/IP network and an XNS network, respectively:
Inet Examples
ifconfig sl1
In this example, the interface to be queried is sl1. The result of the command looks similar to the following:
sl1: flags=51<UP,POINTOPOINT,RUNNING>
inet 192.9.201.3 --> 192.9.354.7 netmask ffffff00
ifconfig lo0 inet 127.0.0.1 up
ifconfig tr0 inet down
In this example, the interface to be marked is token0.
| Note: | Only a user with root user authority can modify the configuration of a network interface. |
ifconfig css0 inet 127.0.0.1 netmask 255.255.255.0 alias
XNS Examples
ifconfig en0 ns 110:02.60.8c.2c.a4.98 up
In this example, ns is the XNS address family, 110 is the network number and 02.60.8c.2c.a4.98 is the host number, which is the Ethernet address unique to each individual interface. Specify the host number when there are multiple Ethernet hardware interfaces, as the default may not correspond to the proper interface. The Ethernet address can be obtained by the commands:
ifconfig en0 netstat -v
The XNS address can be represented by several means, as can be seen in the following examples:
123#9.89.3c.90.45.56 5-124#123-456-900-455-749 0x45:0x9893c9045569:90 0456:9893c9045569H
The first example is in decimal format, and the second example, using minus signs, is separated into groups of three digits each. The 0x and H examples are in hexadecimal format. Finally, the 0 in front of the last example indicates that the number is in octal format.
ifconfig et0 ns 120:02.60.8c.2c.a4.98 up
The en0 and et0 interfaces are considered as separate interfaces even though the same Ethernet adapter is used. Two separate networks can be defined and used at the same time as long as they have separate network numbers. Multiple Ethernet adapters are supported.
| Note: | The host number should correspond to the Ethernet address on the hardware adapter. A system can have multiple host numbers. |
ifconfig en0 inet 11.0.0.1 up ifconfig en0 ns 110:02.60.8c.2c.a4.98 up ifconfig en0 ns 130:02.60.8c.34.56.78 ipdst 11.0.0.10
The first command brings up the Internet with the inet address 11.0.0.1. The second command configures the en0 interface to be network 110 and host number 02.60.8c.2c.a4.98 in the ns address family. This defines the host number for use when the XNS packet is encapsulated within the Internet packet. The last command defines network 130, host number 02.60.8c.34.56.78, and destination Internet address 11.0.0.10. This last entry creates a new network interface, nsip. Use the netstat -i command for information about this interface.
Purpose
install_cw - Completes the installation of system support programs in the control workstation.
Syntax
install_cw
Flags
None.
Operands
None.
Description
Use this command at installation to perform the following tasks:
Purpose
install_hacws - Creates and configures a High Availability Control Workstation (HACWS) configuration from a regular control workstation configuration.
Syntax
install_hacws -p host_name -b host_name [-s]
Flags
Operands
None.
Description
Use this command to perform configuration and installation tasks on HACWS. This command is used instead of install_cw once the configuration has been made an HACWS configuration. This command is valid only when issued on the control workstation. When the command is executed and the calling process is not on a control workstation, an error occurs.
| Note: | The install_hacws command permanently alters a control workstation to an HACWS. The only way to go back to a single control workstation is to have a mksysb image of the primary control workstation before the install_hacws command is executed. |
Both the primary and backup control workstations must be running and capable of executing remote commands via the /usr/lpp/ssp/rcmd/bin/rsh command.
Exit Values
Standard output consists of messages indicating the progress of the command as it configures the control workstations.
Prerequisite Information
Refer to IBM Parallel System Support Programs for AIX: Administration Guide for information on the HACWS option.
Location
/usr/sbin/hacws/install_hacws
Related Information
SP Commands: install_cw, rsh, setup_logd
Examples
install_hacws -p primary_cw -b backup_cw -s
On the primary control workstation, enter:
install_hacws -p primary_cw -b backup_cw
After the preceding command completes on the primary control workstation, enter the following on the backup control workstation:
install_hacws -p primary_cw -b backup_cw
Purpose
jm_config - Reconfigures the Resource Manager.
Syntax
jm_config
Flags
None.
Operands
None.
Description
Use this command to reconfigure the Resource Manager (RM) servers.
This command must be executed by root on the control workstation. It reads the Resource Manager configuration data from the /etc/jmd_config.syspar_name file, where syspar_name represents the current system partition environment. This current working environment can be determined by issuing spget_syspar -n. The jm_config command then contacts the correct primary Resource Manager and sends a message telling the server to update its configuration data from the System Data Repository (SDR). The new configuration takes effect with the next client request.
| Note: | 604 High Nodes cannot be configured as part of a parallel pool. Therefore, the Resource Manager will not allocate these nodes for parallel jobs. |
The Resource Manager can also be reconfigured via the System Management Interface Tool (SMIT). To use SMIT, enter:
smit RM_optionsand select the Reconfigure the Resource Manager option. Refer to IBM Parallel System Support Programs for AIX: Administration Guide for additional information on configuring the Resource Manager and system partitioning.
Files
Related Information
Commands: jm_start, jm_status, jm_stop, locate_jm, spget_syspar
Examples
To reconfigure the R