Command and Technical Reference

IBM Parallel System Support Programs for AIX
Command and
Technical Reference

Version 2 Release 4

GC23-3900-05

Program Number: 5765-529

Note

Before using this information and the product it supports, be sure to read the general information under "Notices".

Fifth Edition (February 1998)

This is a major revision of GC23-3900-04.

This edition applies to Version 2 Release 4 of the IBM Parallel System Support Programs for AIX (PSSP) Licensed Program, program number 5765-529, and to all subsequent releases and modifications until otherwise indicated in new editions. Significant changes or additions to the text and illustrations are indicated by a vertical line (|) to the left of the change.

Order publications through your IBM representative or the IBM branch office serving your locality. Publications are not stocked at the address below.

IBM welcomes your comments. A form for readers' comments may be provided at the back of this publication, or you may address your comments to the following address:

International Business Machines Corporation
Department 55JA, Mail Station P384
522 South Road
Poughkeepsie, NY 12601-5400
United States of America
 
FAX (United States & Canada): 1+914+432-9405
FAX (Other Countries):
   Your International Access Code +1+914+432-9405
 
IBMLink (United States customers only): KGNVMC(MHVRCFS)
IBM Mail Exchange: USIB6TC9 at IBMMAIL
Internet e-mail: mhvrcfs@vnet.ibm.com
World Wide Web: http://www.rs6000.ibm.com

If you would like a reply, be sure to include your name, address, telephone number, or FAX number.

Make sure to include the following in your comment or note:

When you send information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you.

© Copyright International Business Machines Corporation 1995, 1998. All rights reserved.
Note to U.S. Government Users -- Documentation related to restricted rights -- Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule contract with IBM Corp.


Table of Contents

  • Notices
  • Trademarks
  • Publicly Available Software
  • About This Book
  • Who Should Use This Book
  • How This Book Is Organized
  • Command Format
  • Typographic Conventions
  • Accessing Online Information
  • Related Publications
  • Other IBM Publications
  • Non-IBM Publications
  • Manual Pages for Public Code

  • Command Reference

  • Commands
  • System Partitioning and Commands
  • add_principal
  • allnimres
  • arp
  • cfghsd
  • cfghsdvsd
  • cfgvsd
  • chgcss
  • chkp
  • cksumvsd
  • cmonacct
  • cprdaily
  • cptuning
  • create_krb_files
  • createhsd
  • createvsd
  • crunacct
  • cshutdown
  • CSS_test
  • cstartup
  • ctlhsd
  • ctlvsd
  • defhsd
  • defvsd
  • delnimclient
  • delnimmast
  • dsh
  • dshbak
  • Eannotator
  • Eclock
  • Eduration
  • Efence
  • emconditionctrl Script
  • emonctrl Script
  • Emonitor Daemon
  • enadmin
  • endefadapter
  • endefnode
  • enrmadapter
  • enrmnode
  • Eprimary
  • Equiesce
  • Estart
  • Etopology
  • Eunfence
  • Eunpartition
  • export_clients
  • ext_srvtab
  • fencevsd
  • get_vpd
  • ha_vsd
  • ha.vsd
  • hacws_verify
  • haemcfg
  • haemctrl Script
  • haemd Daemon
  • haemloadcfg
  • haemtrcoff
  • haemtrcon
  • haemunlkrm
  • hagsctrl Script
  • hagsd Daemon
  • hagsglsmd Daemon
  • hardmon Daemon
  • hats Script
  • hatsctrl Script
  • hb Script
  • hbctrl Script
  • hc.vsd
  • hmadm
  • hmcmds
  • hmmon
  • hmreinit
  • hostlist
  • hr Script
  • hrctrl Script
  • hsdatalst
  • hsdvts
  • ifconfig
  • install_cw
  • install_hacws
  • jm_config
  • jm_install_verify
  • jm_start
  • jm_status
  • jm_stop
  • jm_verify
  • jmcmi_accesscontrol
  • jmcmi_addpool
  • jmcmi_changepool
  • jmcmi_createjmdconfig
  • jmcmi_deletepool
  • jmcmi_servernodes
  • kadmin
  • kadmind Daemon
  • kdb_destroy
  • kdb_edit
  • kdb_init
  • kdb_util
  • kdestroy
  • kerberos Daemon
  • kinit
  • klist
  • kpasswd
  • kprop
  • kpropd Daemon
  • kshd Daemon
  • ksrvtgt
  • ksrvutil
  • kstash
  • locate_jm
  • lppdiff
  • lsfencevsd
  • lshacws
  • lshsd
  • lskp
  • lsvsd
  • mkamdent
  • mkautomap
  • mkconfig
  • mkinstall
  • mkkp
  • mknimclient
  • mknimint
  • mknimmast
  • mknimres
  • ngaddto
  • ngclean
  • ngcreate
  • ngdelete
  • ngdelfrom
  • ngfind
  • nglist
  • ngnew
  • ngresolve
  • nodecond
  • nrunacct
  • p_cat
  • pcp
  • pdf
  • penotify
  • perspectives
  • pexec
  • pexscr
  • pfck
  • pfind
  • pfps
  • pls
  • pmanctrl
  • pmandef
  • pmanquery
  • pmanrmdloadSDR
  • pmv
  • ppred
  • pps
  • preparevsd
  • prm
  • psyslclr
  • psyslrpt
  • rcmdtgt
  • rcp
  • removehsd
  • removevsd
  • resumevsd
  • rmkp
  • rsh
  • SDR_test
  • SDRAddSyspar
  • SDRArchive
  • SDRChangeAttrValues
  • SDRClearLock
  • SDRCreateAttrs
  • SDRCreateClass
  • SDRCreateFile
  • SDRCreateObjects
  • SDRCreateSystemClass
  • SDRCreateSystemFile
  • SDRDeleteFile
  • SDRDeleteObjects
  • SDRGetObjects
  • SDRListClasses
  • SDRListFiles
  • SDRMoveObjects
  • SDRRemoveSyspar
  • SDRReplaceFile
  • SDRRestore
  • SDRRetrieveFile
  • SDRWhoHasLock
  • seqfile
  • services_config
  • sethacws
  • setup_authent
  • setup_CWS
  • setup_logd
  • setup_server
  • sp_configd
  • sp_configdctrl Script
  • spacctnd
  • spacs_cntrl
  • spadaptrs
  • spapply_config
  • spbootins
  • spchuser
  • spcustomize_syspar
  • spcw_addevents
  • spcw_apps
  • spdeladap
  • spdelfram
  • spdelnode
  • spdisplay_config
  • spethernt
  • spevent
  • spframe
  • spget_syspar
  • spgetdesc
  • sphardware
  • sphostnam
  • sphrdwrad
  • splm
  • splogd Daemon
  • splst_syspars
  • splst_versions
  • splstadapters
  • splstdata
  • splstnodes
  • splsuser
  • spmgrd Daemon
  • spmkuser
  • spmon
  • spmon_ctest
  • spmon_itest
  • spperfmon
  • sprestore_config
  • sprmuser
  • spsitenv
  • spsvrmgr
  • spsyspar
  • spverify_config
  • spvsd
  • startvsd
  • statvsd
  • stopvsd
  • supfilesrv Daemon
  • supper
  • suspendvsd
  • sysctl
  • sysctld Daemon
  • SYSMAN_test
  • syspar_ctrl
  • sysparaid
  • s1term
  • ucfghsd
  • ucfghsdvsd
  • ucfgvsd
  • unallnimres
  • undefhsd
  • undefvsd
  • unfencevsd
  • updatehsd
  • updatevsdnode
  • updatevsdtab
  • verparvsd
  • vhostname
  • vsdatalst
  • vsdchgserver
  • vsddiag
  • vsdelnode
  • vsdelvg
  • vsdnode
  • vsdsklst
  • vsdvg
  • vsdvgts
  • vsdvts

  • Technical Reference

  • RS/6000 SP Files and Other Technical Information
  • auto.master File
  • haemloadlist File
  • hmacls File
  • .klogin File
  • Kerberos
  • krb.conf File
  • krb.realms File
  • SDR_dest_info File
  • sysctl.acl File
  • sysctl.conf File
  • tuning.commercial File
  • tuning.default File
  • tuning.development File
  • tuning.scientific File
  • SP Subroutines
  • getvhostname Subroutine
  • hacws_set Subroutine
  • hacws_stat Subroutine
  • LAPI_Address Subroutine
  • LAPI_Address_init Subroutine
  • LAPI_Amsend Subroutine
  • LAPI_Fence Subroutine
  • LAPI_Get Subroutine
  • LAPI_Getcntr Subroutine
  • LAPI_Gfence Subroutine
  • LAPI_Init Subroutine
  • LAPI_Msg_string Subroutine
  • LAPI_Probe Subroutine
  • LAPI_Put Subroutine
  • LAPI_Qenv Subroutine
  • LAPI_Rmw Subroutine
  • LAPI_Senv Subroutine
  • LAPI_Setcntr Subroutine
  • LAPI_Term Subroutine
  • LAPI_Waitcntr Subroutine
  • setvhostname Subroutine
  • swclockGetIncrement Subroutine
  • swclockInit Subroutine
  • swclockRead Subroutine
  • swclockReadSec Subroutine
  • swclockTerm Subroutine

  • Appendixes

  • Appendix A. Perspectives Colors and Fonts
  • Perspectives Colors with Red, Green, and Blue (RGB) Triplets
  • Perspectives Fonts
  • Glossary of Terms and Abbreviations

  • Index

  • Notices

    References in this publication to IBM products, programs, or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM's product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any of IBM's intellectual property rights may be used instead of the IBM product, program, or service. Evaluation and verification of operation in conjunction with other products, except those expressly designated by IBM, are the user's responsibility.

    IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to:

    IBM Director of Licensing
    IBM Corporation
    500 Columbus Avenue
    Thornwood, NY 10594
    USA

    Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact:

    IBM Corporation
    Mail Station P300
    522 South Road
    Poughkeepsie, NY 12601-5400
    USA
    Attention: Information Request

    Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee.


    Trademarks

    The following terms are trademarks of the International Business machines Corporation in the United States and/or countries:

    AIX
     
    AIX/6000
     
    DATABASE 2
     
    ES/9000
     
    ESCON
     
    HACMP/6000
     
    IBM
     
    IBMLink
     
    LoadLeveler
     
    NQS/MVS
     
    POWERparallel
     
    RS/6000
     
    RS/6000 Scalable POWERparallel Systems
     
    Scalable POWERparallel Systems
     
    SP
     
    System/370
     
    System/390
     
    TURBOWAYS
     

    Microsoft, Windows, and the Windows 95 logo are trademarks or registered trademarks of Microsoft Corporation.

    UNIX is a registered trademark in the United States and other countries licensed exclusively through X/Open Company Limited.

    Other company, product, and service names, which may be denoted by a double asterisk (**), may be trademarks or service marks of others.


    Publicly Available Software

    This product includes software that is publicly available:

    expect
    Programmed dialogue with interactive programs

    Kerberos
    Provides authentication of the execution of remote commands

    NTP
    Network Time Protocol

    Perl
    Practical Extraction and Report Language

    SUP
    Software Update Protocol

    Tcl
    Tool Command Language

    TclX
    Tool Command Language Extended

    Tk
    Tcl-based Tool Kit for X-windows

    This book discusses the use of these products only as they apply specifically to the SP system. The distribution for these products includes the source code and associated documentation. (Kerberos does not ship source code.) /usr/lpp/ssp/public contains the compressed tar files of the publicly available software. (IBM has made minor modifications to the versions of Tcl and Tk used in the SP system to improve their security characteristics. Therefore, the IBM-supplied versions do not match exactly the versions you may build from the compressed tar files.) All copyright notices in the documentation must be respected. You can find version and distribution information for each of these products that are part of your selected install options in the /usr/lpp/ssp/README/ssp.public.README file.


    About This Book

    The IBM Parallel System Support Programs for AIX: Command and Technical Reference provides detailed syntax and parameter information for all commands you can use to install, customize, and maintain the IBM RS/6000 SP system.

    Other books that help you administer and use the SP system include:

    This book applies to PSSP Version 2 Release 4. To find out what version of PSSP is running on your control workstation, enter the following:

    
    SDRGetObjects SP code_version
    

    In response, the system displays something similar to:

    code_version
    PSSP-2.4
    

    If the response indicates PSSP-2.4, this book applies to the version of PSSP that is running on your system.

    To find out what version of PSSP is running on the nodes of your system, enter the following from your control workstation:

    splst_versions -G -t
    

    In response, the system displays something similar to:

    1 PSSP-2.4
    2 PSSP-2.4
    7 PSSP-2.3
    8 PSSP-2.3
    

    If the response indicates PSSP-2.4, this book applies to the version of PSSP that is running on your system.

    If you are running mixed levels of PSSP, be sure to maintain and refer to the appropriate documentation for whatever versions of PSSP you are running.


    Who Should Use This Book

    This book is intended for anyone not familiar with the syntax and use of the RS/6000 SP commands.


    How This Book Is Organized

    This book consists of three parts. Part 1 of this book is the Command Reference. It contains RS/6000 SP commands which are organized alphabetically. Part 2 of this book is the Technical Reference. It contains RS/6000 SP files, subroutines, and other technical information. Part 3 of this book is the Appendix. It lists Perspectives colors and fonts.

    The back of the book includes a glossary and an index.

    Vertical bars (|) to the left of the text in this book indicate changes or additions.


    Command Format

    The commands in this book are in the following format:

    Purpose
    Provides the name of the command and a brief description of its purpose.

    Syntax
    Includes a diagram that summarizes the use of the command.

    Flags
    Lists and describes the options that control the behavior of the command.

    Operands
    Lists and describes the objects on which the command operates.

    Description
    Includes a complete description of the command.

    Files
    Lists any RS/6000 SP system files that are read, employed, referred to, or written to by the command, or that are otherwise relevant to its use.

    Standard Input
    Describes what this command reads from standard input.

    Standard Output
    Describes what this command writes to standard output.

    Standard Error
    Describes what and when this command writes to standard error.

    Exit Values
    Describes the values returned and the conditions that caused the values to be returned.

    Security
    Describes who can run this command and provides other security-related information.

    Restrictions
    Lists restrictions beyond the security restrictions described previously.

    Implementation Specifics
    Identifies the package of each individual command.

    Prerequisite Information
    Provides a pointer to other documents that would enhance the user's understanding of this command.

    Location
    Specifies the location of the command.

    Related Information
    Lists RS/6000 SP commands, functions, file formats, and special files that are employed by the command, that have a purpose which is related to that of the command, or that are otherwise of interest within the context of the command. Also listed are related RS/6000 SP documents, other related documents, and miscellaneous information related to the command.

    Examples
    Provides examples of ways in which the command is typically used.

    Typographic Conventions

    This book uses the following typographic conventions:
    Typographic Usage
    Bold

    • Bold words or characters represent system elements that you must use literally, such as commands, flags, and path names.

    • Bold words also indicate the first use of a term included in the glossary.

    Italic

    • Italic words or characters represent variable values that you must supply.

    • Italics are also used for book titles and for general emphasis in text.

    Constant width Examples and information that the system displays appear in constant width typeface.
    [ ] Brackets enclose optional items in format and syntax descriptions.
    { } Braces enclose a list from which you must choose an item in format and syntax descriptions.
    | A vertical bar separates items in a list of choices. (In other words, it means "or.")
    < > Angle brackets (less-than and greater-than) enclose the name of a key on the keyboard. For example, <Enter> refers to the key on your terminal or workstation that is labeled with the word Enter.
    ... An ellipsis indicates that you can repeat the preceding item one or more times.
    <Ctrl-x> The notation <Ctrl-x> indicates a control character sequence. For example, <Ctrl-c> means that you hold down the control key while pressing <c>.


    Accessing Online Information

    In order to use the PSSP man pages or access the PSSP online (HTML) publications, the ssp.docs file set must first be installed. To view the PSSP online publications, you also need access to an HTML document browser such as Netscape. An index to the HTML files that are provided with the ssp.docs file set is installed in the /usr/lpp/ssp/html directory.

    Obtaining Documentation

    You can view this book or download a PostScript version of it from the IBM RS/6000 web site at http://www.rs6000.ibm.com. At the time this manual was published, the full path was http://www.rs6000.ibm.com/resource/aix_resource/sp_books. However, the structure of the RS/6000 web site can change over time.


    Related Publications

    Here are some related publications.

    RS/6000 SP Publications

    Parallel System Support Programs for AIX Publications

    As an alternative to ordering the individual books, you can use SBOF-8587 to order the entire SP software library.

    IBM Virtual Shared Disk and IBM Recoverable Virtual Shared Disk Publication

    Performance Monitor Publication

    General Parallel File System for AIX Publication

    LoadLeveler Publications

    Parallel Environment for AIX Publications

    As an alternative to ordering the individual books, you can use SBOF-8588 to order the entire Parallel Environment for AIX library.

    Client Input Output/Sockets (CLIO/S) Publications

    Parallel I/O File System Publications

    Network Tape Access and Control System for AIX (NetTAPE) Publications


    Other IBM Publications

    Here are some other IBM publications that you may find helpful.

    International Technical Support Organization Publications (Red Books)

    AIX and RS/6000 Publications

    Service

    Network Queueing System/MVS (NQS/MVS)

    Network Connectivity

    RS/6000 SP Switch Router

    Order according to RS/6000 SP Switch Router model:

    You can order the RS/6000 SP Switch Router as the IBM 9077.

    Adapters


    Non-IBM Publications

    Here are some non-IBM publications that you may find helpful.

    Parallel Computing

    Tcl


    Manual Pages for Public Code

    The following manual pages for public code are available in this product:

    SUP
    /usr/lpp/ssp/man/man1/sup.1

    NTP
    /usr/lpp/ssp/man/man8/xntpd.8

     
    /usr/lpp/ssp/man/man8/xntpdc.8

    Perl (Version 4.036)
    /usr/lpp/ssp/perl/man/perl.man

     
    /usr/lpp/ssp/perl/man/h2ph.man

     
    /usr/lpp/ssp/perl/man/s2p.man

     
    /usr/lpp/ssp/perl/man/a2p.man

    Perl (Version 5.003)
    Man pages are in the /usr/lpp/ssp/perl5/man/man1 directory

    Manual pages and other documentation for Tcl, TclX, Tk, and expect can be found in the compressed tar files located in /usr/lpp/ssp/public.


    Command Reference


    Commands

    This part of the book contains the RS/6000 SP commands.

    To access the RS/6000 SP online manual pages, set the MANPATH environment variable as follows:

    for ksh
    export MANPATH=$MANPATH:/usr/lpp/ssp/man
    
    for csh
    setenv MANPATH $MANPATH\:/usr/lpp/ssp/man
    

    System Partitioning and Commands

    When you partition your system, you create one or more system partitions which, for most tasks, function as separate and distinct logical RS/6000 SP systems. Most commands function within the boundary of the system partition in which they are executed. A number of commands, however, continue to treat the RS/6000 SP as a single entity and do not respect system partition boundaries. That is, in their normal function they may affect a node or other entity outside of the current system partition. In addition, some commands which normally function only within the current system partition have been given a new parameter which, when used, allows the scope of that command to exceed the boundaries of the current system partition.

    On the control workstation, the administrator is in an environment for one system partition at a time. The SP_NAME environment variable identifies the system partition to subsystems. (If this environment variable is not set, the system partition is defined by the primary: stanza in the /etc/SDR_dest_info file.) Most tasks performed on the control workstation that get information from the System Data Repository (SDR) will get the information for that particular system partition.

    In managing multiple system partitions, it is helpful to open a window for each system partition. You can set and export the SP_NAME environment variable in each window and set up the window title bar or shell prompt with the system partition name. The following script is an example:

    sysparenv:
    # !/bin/ksh
      for i in 'splst_syspars'
      do
         syspar='host $i | cut -f 1 -d"."'
         echo "Opening the $syspar partition environment"
         sleep 2
         export SP_NAME=$syspar
         aixterm -T "Work Environment for CWS 'hostname -s' - View: $syspar" -ls -sb &
      done
      exit
     
    .profile addition:
    # Added for syspar environment setup
      if [ "'env | grep SP_NAME | cut -d= -f1'" = SP_NAME ]
         then
            PS1="['hostname -s'<p>"$SP_NAME] ['$PWD]> '
         else
            PS1="['hostname -s']["'$PWD]< '
      fi
      export ENV
    

    As a user, you can check what system partition you're in with the command:

    spget_syspar -n
    

    The following table summarizes those commands which can exceed the boundary of the current system partition. Unless otherwise stated, commands not listed in this table have as their scope the current system partition.
    Command Effect
    arp Can reference any node (by its host name) in any system partition.
    Automounter commands Host names need not be in the current system partition.
    crunacct Merges accounting data from all nodes regardless of system partition boundaries.
    cshutdown -G The -G flag allows specification of target nodes outside of the current system partition.
    cstartup -G The -G flag allows specification of target nodes outside of the current system partition.
    dsh
    dsh -w{hostname | -}
    

    Hosts added to the working collective by host name need not be in the current system partition.
    dsh -aG The -G flag modifies the -a flag (all nodes in the current system partition) by expanding the scope to all nodes in the entire physical SP system.
    Eclock There is a single switch clock for the SP regardless of the number of system partitions.
    Efence -G The -G flag allows specification of nodes outside of the current system partition.
    emonctrl -c The system partition-sensitive control script for the emon subsystem supports the -c option, which crosses system partitions.
    Eunfence -G The -G flag allows specification of nodes outside of the current system partition.
    haemctrl -c
    haemctrl -u
    

    The system partition-sensitive control script for the haem subsystem supports the -c and -u options, which cross system partitions.
    hagsctrl -c
    hagsctrl -u
    

    The system partition-sensitive control script for the hags subsystem supports the -c and -u options, which cross system partitions.
    hatsctrl -c
    hatsctrl -u
    

    The system partition-sensitive control script for the hats subsystem supports the -c and -u options, which cross system partitions.
    hbctrl -c The system partition-sensitive control script for the hb subsystem supports the -c option, which crosses system partitions.
    hmcmds -G The -G flag allows the hmcmds commands to be sent to any hardware on the SP system.
    hmmon -G The -G flag allows for the specification of hardware outside of the current system partition.
    hostlist
    hostlist -f filename
    hostlist -w hostname
    

    Host names need not be in the current system partition.
    hostlist -aG | -nG | -sG The -G flag modifies the -a, -n, or -s flag by expanding the scope to the entire physical SP system.
    hrctrl -c The system partition-sensitive control script for the hr subsystem supports the -c option, which crosses system partitions.
    hsdatalst -G The -G flag causes the display of HSD information to be for all system partitions.
    lppdiff -aG The -G flag modifies the -a flag (all nodes in the current system partition) by expanding the scope to all nodes in the entire physical SP system.
    nodecond -G The -G flag allows specification of a node outside of the current system partition.
    psyslrpt -w hostnames The host names supplied with the -w flag can be in any system partition (the -a flag will select all nodes in the current system partition).
    psyslclr -w hostnames The host names supplied with the -w flag can be in any system partition (the -a flag will select all nodes in the current system partition).
    penotify -w hostnames The host names supplied with the -w flag can be in any system partition (the -a flag will select all nodes in the current system partition).
    pmanctrl -c The system partition-sensitive control script for the pman subsystem supports the -c option, which crosses system partitions.
    Parallel commands:
    • p_cat
    • pcp
    • pdf
    • pfck
    • pexec
    • pexscr
    • pfind
    • pfps
    • pls
    • pmv
    • ppred
    • pps
    • prm
    Parallel commands can take the following options and will behave accordingly:

    -w
    Host names specified with -w need not be in the current system partition.

    noderange
    Nodes specified by noderange must be in the current system partition.

    hostlist_args
    Host names specified with hostlist options -w or -G need not be in the current system partition (any other hostlist options operate within the current system partition).
    
    SDRArchive,
    SDRRestore
    

    Archives/restores the SDR representing the entire SP.
    SDRGetObjects -G The -G flag allows for retrieval of partitioned class objects from partitions other than the current system partition. Without the -G, objects which are in a partitioned class are retrieved from the current system partition only.
    SDRMoveObjects Moves objects from one system partition to another.
    Other SDR commands SDR commands that create, change or delete values work within the system partition. Note though that System classes (Frame, for example) are shared among all system partitions. Changes to system classes will affect other system partitions.
    Security commands:
    • ext_srvtab
    • kadmin
    • kdb_destroy
    • kdb_edit
    • kdb_init
    • kdb_util
    • kdestroy
    • kinit
    • klist
    • kpasswd
    • kprop
    • ksrvtgt
    • ksrvutil
    • kstash
    • rcmdtgt
    • rcp
    • rsh
    • setup_authent
    The function of these security commands is unchanged under system partitioning. That is, if they previously affected the entire SP, they continue to do so even if the system has been partitioned. If they previously had the ability to affect a remote node (rsh, rcp, for example), that function is unchanged in a system partitioned environment.
    spapply_config Applies a system partition configuration to the entire SP.
    spbootins If a boot server outside of the current system partition is specified, that node is prepared appropriately.
    sp_configdctrl -c The system partition-sensitive control script for the sp_configd subsystem supports the -c option, which crosses system partitions.
    spframe Configures data for one or more frames across the entire SP.
    splm The target nodes defined in the input table can include nodes from any system partition.
    splst_versions -G The -G flag allows for retrieval of PSSP version information from nodes outside the current system partition.
    splstdata -G The -G flag allows display of information on nodes and adapters outside of the current system partition.
    splstadapters -G The -G flag lists information about target adapters outside of the current system partition.
    splstnodes -G The -G flag lists information about target nodes outside of the current system partition.
    spmon -G The -G flag allows specification of nodes outside of the current system partition. The -G flag is required when performing operations on any frame or switch.
    sprestore_config Restores the entire SP SDR from a previously made archive.
    spsitenv Site environment variables are specified for the SP system as a whole. The specification of acct_master= can be any node in the SP regardless of system partition. The specification of install_image= may cause boot server nodes outside of the current system partition to refresh the default installation image they will serve to their nodes.
    spverify_config Verifies the configuration of all system partitions in the SPsystem.
    supper File collections are implemented and managed without respect to system partition boundaries.
    sysctl The Sysctl client can send requests to any node in the SP.
    syspar_ctrl -c -G The -c and -G flags allow for the crossing of system partitions in providing a single interface to the control scripts for the system partition-sensitive subsystems.
    s1term -G The -G flag allows specification of a node outside of the current system partition.
    vsdatalst -G The -G flag causes the display of IBM Virtual Shared Disk information to be for all system partitions.
    vsdsklst -G The -G flag specifies the display of information for disks outside the current system partition.

    add_principal

    Purpose

    add_principal - Creates principals in the authentication database.

    Syntax

    add_principal [-r realm_name] [-v] file_name

    Flags

    -r
    Adds principals to a realm other than the local realm.

    -v
    Specifies verbose mode. A message is written to standard output for each principal added to the authentication database.

    Operands

    file_name
    Specifies the file containing principal names and passwords to add to the authentication database.

    Description

    This command provides an interface to the authentication database to add an entry for a user or service instance, supplying the password used to generate the encrypted private key. The add_principal command is suitable for mass addition of users or multiple instances of servers (for example, SP nodes).

    This command operates noninteractively if you have a valid ticket-granting-ticket (TGT) for your admin instance in the applicable realm. A TGT can be obtained using the kinit command. If you do not have a TGT for the admin instance for the realm in which you are adding principals, or if the add_principal command cannot obtain a service ticket for changing passwords using the admin TGT, the user is prompted for the password for the user's admin instance.

    Administrators use the add_principal command to register new users and services instances to the authentication database. An administrator must have a principal ID with an instance of admin. Also, user_name.admin must appear in the admin_acl.add Access Control List (ACL).

    The add_principal program communicates over the network with the kadmind program, which runs on the machine housing the primary authentication database. The kadmind program creates new entries in the database using data provided by this command.

    When using the add_principal command, the principal's expiration date and maximum ticket lifetime are set to the default values. To override the defaults, the root user must use the kdb_edit command to modify those attributes.

    Input to the program is read from the file specified by the file_name argument. It contains one line of information for each principal to be added, in the following format:

    name[.instance][@realm] password
    
    Note: The @realm cannot be different from the local realm or the realm argument if the -r option is specified.

    For user entries with a NULL instance, this format matches that of the log file created by the spmkuser command. Any form of white space can surround the two fields. Blank lines are ignored. Any line containing a # as the first nonwhite space character, is treated as a comment.

    Since the input file contains principal identifiers and their passwords, ensure that access to the file is controlled. You should remove the input file containing the unencrypted passwords after using it, or delete the passwords from it.

    The add_principal command does not add principals to an AFS authentication database. If authentication services are provided through AFS, use the AFS kas command to add principals to the database. Refer to the chapter on security in IBM Parallel System Support Programs for AIX: Administration Guide for an overview.

    Files

    /var/kerberos/database/admin_acl.add
    Access Control List file.

    Exit Values

    0
    Indicates success. It does not mean that all IDs were added. Individual messages indicate what was added.

    nonzero
    Indicates a failure with an appropriate message.

    Related Information

    Commands: kadmin, kinit, kpasswd, ksrvutil

    Refer to Chapter 2, "RS/6000 SP Files and Other Technical Information" section of IBM Parallel System Support Programs for AIX: Command and Technical Reference for additional Kerberos information.

    allnimres

    Purpose

    allnimres - Allocates Network Installation Management (NIM) resources from a NIM master to a NIM client.

    Syntax

    allnimres -h | -l node_list

    Flags

    -h
    Displays usage information. If the command is issued with the -h flag, the syntax description is displayed to standard output and no other action is taken (even if other valid flags are entered along with the -h flag).

    -l node_list
    Indicates by node_list the SP nodes to which to allocate installation resources. The node_list is a comma-separated list of node numbers.

    Operands

    None.

    Description

    Use this command to allocate all necessary NIM resources to a client based on the client's bootp_response in the System Data Repository (SDR). This includes executing the bos_inst command for allocation of the boot resource and nimscript resource. At the end of this command, nodes are ready to netboot to run installation, diagnostics, or maintenance. If the node's bootp_response is "disk", all NIM resources are deallocated from the node.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    -1
    Indicates that an error occurred.

    Security

    You must have root privilege to run this command.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Location

    /usr/lpp/ssp/bin/allnimres

    Related Information

    Commands: setup_server, unallnimres

    Examples

    To allocate boot/installation resources to boot/install client nodes 1, 3, and 5 from their respective boot/install servers, enter:

    allnimres -l 1,3,5
    

    arp

    Purpose

    /usr/lpp/ssp/css/arp - Displays and modifies address resolution.

    Syntax

    arp
    {host_name | -a [/dev/kmem]} | -d host_name |
     
    -s type host_name adapter_address [route] [temp] [pub] |
     
    -f file_name [type]

    Parameters

    -a
    Displays all of the current Address Resolution Protocol (ARP) entries. Use the crash command to look at KMEM or UMUnix variables. Specify the -a /dev/kmem flag to display ARP information for kernel memory.

    -d host_name
    Deletes an ARP entry for the host specified by the host_name variable if the user has root user authority.

    -f file_name
    Causes the file specified by the file_name variable to be read and multiple entries to be set in the ARP tables. Entries in the file should be in the form:
    type host_name adapter_address [route] [temp] [pub]
    

    -s type host_name adapter_address [route] [temp] [pub]
    Creates an ARP entry for the host specified by the host_name variable with the adapter address specified by the adapter_address variable. The adapter address is given as 6 hexadecimal bytes separated by colons. The line must be in the following format:
    type host_name adapter_address [route] [temp] [pub]
    

    where:

    type
    Specifies the type of hardware address as follows:
    ether
    An Ethernet interface
    802.3
    An 802.3 interface
    switch
    A Scalable POWERparallel Switch (SP Switch) or a High Performance Switch
    fddi
    A Fiber Distributed Data Interface
    802.5
    A token-ring interface

    host_name
    Specifies the host_name for which to create an entry.

    adapter_address
    Specifies the physical address (switch node number) for the switch adapters.

    route
    Specifies the route for a token-ring interface or Fiber Distributed Data Interface (FDDI) as defined in the token-ring or FDDI header.

    temp
    Specifies that this ARP table entry is temporary. The table entry is permanent if this argument is omitted.

    pub
    Specifies that this table entry is to be published, and that this system acts as an ARP server responding to requests for host_name, even though the host address is not its own.

    Description

    The arp command has been modified to add support for the switch. This command is valid only on an SP system.

    The arp command displays and modifies the Internet-to-adapter address translation tables used by ARP. The arp command displays the current ARP entry for the host specified by the host_name variable. The host can be specified by name or number, using Internet dotted decimal notation.

    Related Information

    SP Command: ifconfig

    AIX Commands: crash, netstat

    AIX Daemon: inetd

    Refer to IBM Parallel System Support Programs for AIX: Administration Guide for additional information on the SP Switch and the High Performance Switch.

    Refer to "TCP/IP Protocols" in AIX Version 4.1 System Management Guide: Communications and Networks.

    Examples

    1. To add a single entry to the arp mapping tables until the next time the system is restarted, enter:
      arp -s switch host2 1
      

    2. To delete a map table entry for the specified host with the arp command, enter:
      arp -d host1
      

    cfghsd

    Purpose

    cfghsd - Configures a data striping device (HSD) for an IBM Virtual Shared Disk.

    Syntax

    cfghsd -a hsd_name ...

    Flags

    -a
    Specifies all the data striping devices that have been defined.

    Operands

    hsd_name
    Specifies a defined HSD. All underlying IBM Virtual Shared Disks in the HSD must be configured before using this command.

    Description

    This command configures the already defined HSDs and makes them available. The command extracts information from the System Data Repository (SDR).

    Files

    /usr/lpp/csd/bin/cfghsd
    Specifies the command file.

    Security

    You must have root privilege to run this command.

    Restrictions

    If you have the IBM Recoverable Virtual Shared Disk product installed and operational, do not use this command. The results may be unpredictable.

    See IBM Parallel System Support Programs for AIX: Managing Shared Disks.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: defhsd, hsdatalst, lshsd, ucfghsd

    Examples

    To make the HSD hsd1 available, enter:

    cfghsd hsd1
    

    cfghsdvsd

    Purpose

    cfghsdvsd - Configures a data striping device (HSD) and the IBM Virtual Shared Disks that comprise it on one node.

    Syntax

    cfghsdvsd -a | {hsd_name...}

    Flags

    -a
    Specifies that all the data striping devices defined on this system or system partition are to be configured (made available).

    Operands

    hsd_name
    Specifies the names of defined HSDs that are to be configured. This command configures the underlying IBM Virtual Shared Disks as well.

    Description

    Use this command to configure already-defined HSDs and their underlying IBM Virtual Shared Disks and make them available. Note all of the IBM Virtual Shared Disks go to the active state.

    You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:

    smit hsd_mgmt
    
    and select the Configure an HSD and its underlying IBM Virtual Shared Disks option.

    Files

    /usr/lpp/csd/bin/cfghsdvsd
    Specifies the command file.

    Security

    You must have sysctl and sysctl.vsd access and authorization from your system administrator to run this command.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: cfghsd, cfgvsd, ucfghsdvsd

    Examples

    To configure the data striping device hsd1 and the IBM Virtual Shared Disks that comprise it, enter:

    cfghsdvsd hsd1
    

    cfgvsd

    Purpose

    cfgvsd - Configures an IBM Virtual Shared Disk.

    Syntax

    cfgvsd -a | vsd_name ...

    Flags

    -a
    Specifies all IBM Virtual Shared Disks that have been defined.

    Operands

    vsd_name
    Specifies a defined IBM Virtual Shared Disk.

    Description

    Use this command to configure the already defined IBM Virtual Shared Disks and bring them to the stopped state. It does not make the IBM Virtual Shared Disk available. The command extracts information from the System Data Repository (SDR).

    You can use the System Management Interface Tool (SMIT) to run the cfgvsd command. To use SMIT, enter:

    smit vsd_mgmt
    
    and select the Configure an IBM Virtual Shared Disk option.

    Files

    /usr/lpp/csd/bin/cfgvsd
    Specifies the command file.

    Security

    You must have root privilege to run this command.

    Restrictions

    If you have the IBM Recoverable Virtual Shared Disk product installed and operational, do not use this command. The results may be unpredictable.

    See IBM Parallel System Support Programs for AIX: Managing Shared Disks.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: ctlvsd, lsvsd, preparevsd, resumevsd, startvsd, stopvsd, suspendvsd, ucfgvsd

    Examples

    To bring the IBM Virtual Shared Disk vsd1vg1n1 from the defined state to the stopped state, enter:

    cfgvsd vsd1vg1n1
    

    chgcss

    Purpose

    chgcss - Applies configuration changes to a Scalable POWERparallel Switch (SP Switch) Communications Adapter (Type 6-9) or a High Performance Switch Communications Adapter Type 2.
    Implementation Note

    Configuration changes are later applied to the device when it is configured at system reboot.

    Syntax

    chgcss -l name -a attr=value [-a attr=value]

    Flags

    -l name
    Specifies the device logical name in the Customized Devices object class whose attribute values should be changed.

    -a attr=value
    Identifies an attribute to be changed and the value to which it should be changed

    where:

    attr
    Specifies the IP buffer pool for the switch device driver as follows:

    rpoolsize
    IP receive buffer pool

    spoolsize
    IP send buffer pool

    value
    Specifies the IP buffer pool size in bytes.

    Operands

    None.

    Description

    Use this command to apply configuration changes to an SP Switch Communications Adapter (Type 6-9) or a High Performance Switch Communications Adapter Type 2.

    Security

    You must have root privilege to run this command.

    Prerequisite Information

    For additional information on values for the rpoolsize and spoolsize attributes, refer to the "Tuning the SP System" chapter in IBM Parallel System Support Programs for AIX: Administration Guide.

    Related Information

    AIX Command: lsattr

    Examples

    1. To change the size of the IP receive buffer to 1024K, enter:
      chgcss -l css0 -a rpoolsize=0x100000
      

    2. To change the size of the IP send and receive buffers to 1024K, enter:
      chgcss -l css0 -a rpoolsize=1048576 -a spoolsize=1048576
      

    chkp

    Purpose

    chkp - Changes Kerberos principals.

    Syntax

    chkp -h

    chkp [-e expiration] [-l lifetime] name[.instance] ...

    Flags

    -h
    Displays usage information.

    -e expiration
    Specifies a new expiration date for the principals. The date must be entered in the format yyyy-mm-dd, and the year must be a value from 1970 to 2037. The time of expiration is set to 11:59 PM local time on the date specified.

    -l lifetime
    Specifies the new maximum ticket lifetime for the principals. The lifetime must be specified as a decimal number from 0 to 255. These values correspond to a range of time intervals from five minutes to 30 days. Refer to IBM Parallel System Support Programs for AIX: Administration Guide for a complete list of the possible ticket lifetime values you can enter and the corresponding durations in days, hours, and minutes. The following list shows a representative sample with approximate durations:
    lifetime operand - Approximate duration
          141                1 day
          151                2 days
          170                1 week
          180                2 weeks
          191                1 month
    

    At least one flag must be specified.

    Operands

    name[.instance] ...
    Identifies the principals to change.

    Description

    Use this command to change principals in the local Kerberos database. It allows the current expiration date and maximum ticket lifetime to be redefined. It cannot be used to change the principal's password. To do that, the administrator must use the kpasswd, kadmin, or kdb_edit commands. The chkp command should normally be run only on the primary server. If there are secondary authentication servers, the push-kprop command is invoked to propagate the change to the other servers. The command can be used to update a secondary server's database, but the changes may be negated by a subsequent update from the primary.

    Files

    /var/kerberos/database/admin_acl.mod

    /var/kerberos/database/principal.*
    Kerberos database files.

    Exit Values

    0
    Indicates the successful completion of the command. Specified principals that exist were changed. If any principal that you specify does not exist in the database, a message is written to standard error and processing continues with any remaining principals.

    1
    Indicates that an error occurred and no principal was changed. One of the following conditions was detected:

    Security

    The chkp command can be run by the root user logged in on a Kerberos server host. It can be invoked indirectly as a Sysctl procedure by a Kerberos database administrator who has a valid ticket and is listed in the admin_acl.mod file.

    Location

    /usr/kerberos/etc/chkp

    Related Information

    Commands: kadmin, kdb_edit, lskp, mkkp, rmkp, sysctl

    Examples

    1. To set the default maximum ticket lifetime for new principals to (approximately) one week, enter:
      chkp -l 171 default
      

    2. To set the maximum ticket lifetime to approximately three weeks and the expiration date to 30 June 2003 for several principals, enter:
      chkp -l 181 -e 2003-06-30 franklin jtjones root.admin susan
      

    cksumvsd

    Purpose

    cksumvsd - Views and manipulates the IBM Virtual Shared Disk checksum parameters.

    Syntax

    cksumvsd [-s] [-R] [-i | -I]

    Flags

    -s
    Shows IP checksum counters only.

    -R
    Resets IP checksum counters.

    -i
    Calculates IP checksum on all IBM Virtual Shared Disk remote messages.

    -I
    Indicates not to calculate IP checksum on all IBM Virtual Shared Disk remote messages.

    If no flags are specified, the current setting of all IBM Virtual Shared Disk checksum parameters and counters are displayed.

    Operands

    None.

    Description

    The IBM Virtual Shared Disk IP device driver can calculate and send checksums on remote packets it sends. It also can calculate and verify checksums on remote packets it receives. The cksumvsd command is used to tell the device driver whether to perform checksum processing. The default is no checksumming.

    Issuing cksumvsd -i turns on checksumming on the node on which it is run. cksumvsd -i must be issued on all IBM Virtual Shared Disk nodes in the system partition, or the IBM Virtual Shared Disk software will stop working properly on the system partition. If node A has cksumvsd -i (checksumming turned on) and node B has cksumvsd -I (checksumming turned off, the default), then A will reject all messages from B (both requests and replies), since A's checksum verification will fail on all B's messages. The safe way to run cksumvsd -i is to make sure that all IBM Virtual Shared Disks on all nodes are in the STOPPED or SUSPENDED states, issue cksumvsd -i on all nodes, then resume the needed IBM Virtual Shared Disks on all nodes.

    In checksumming mode, the IBM Virtual Shared Disk IP device driver keeps a counter of the number of packets received with good checksums, and the number received with bad checksums. cksumvsd and statvsd both display these values (statvsd calls cksumvsd -s).

    cksumvsd dynamically responds to the configuration of the IBM Virtual Shared Disk IP device driver loaded in the kernel. Its output and function may change if the IBM Virtual Shared Disk IP device driver configuration changes.

    Files

    /dev/kmem
    cksumvsd reads and writes /dev/kmem to get information to and from the IBM Virtual Shared Disk IP device driver in the kernel.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Command: cfgvsd

    Examples

    1. To display the IBM Virtual Shared Disk checksum settings and counter values, enter:
      cksumvsd
      

      You should receive output similar to the following:

      VSD cksum: current values:
      do_ip_checksum: 0
      ipcksum_cntr:   350 good,       0 bad,  0 % bad.
      

      The IBM Virtual Shared Disk checksumming is currently turned off on the node. Prior to this, checksumming was turned on and 350 IBM Virtual Shared Disk remote messages were received, all with good checksumming.

    2. To turn IBM Virtual Shared Disk checksumming on and display counters, enter:
      cksumvsd -i
      

      You should receive output similar to the following:

      VSD cksum: current values:
      do_ip_checksum: 0
      ipcksum_cntr:   350 good,       0 bad,  0 % bad.
      VSD cksum: new values:
      do_ip_checksum: 1
      ipcksum_cntr:   350 good,       0 bad,  0 % bad.
      

      The command displays old and new values. As before, the node has received 350 IBM Virtual Shared Disk remote messages with good checksums.

    3. To display only the IBM Virtual Shared Disk checksum counters, enter:
      cksumvsd -s
      

      You should receive output similar to the following:

      ipcksum_cntr:   350 good,       0 bad,  0 % bad.
      

    cmonacct

    Purpose

    cmonacct - Performs monthly or periodic SP accounting.

    Syntax

    cmonacct [number]

    Flags

    None.

    Operands

    number
    Specifies which month or other accounting period to process. The default is the current month.

    Description

    The cmonacct command performs monthly or periodic SP system accounting. The intervals are set in the crontab file. You can set the cron daemon to run the cmonacct command once each month or at some other specified time period. By default, if accounting is enabled for at least one node, cmonacct executes on the first day of every month.

    The cmonacct command creates summary files under the /var/adm/cacct/fiscal directory and restarts summary files under the /var/adm/cacct/sum directory, the cumulative summary to which daily reports are appended.

    cprdaily

    Purpose

    cprdaily - Creates an ASCII report of the previous day's accounting data.

    Syntax

    cprdaily [-c] [[-l] [yyyymmdd]]

    Flags

    -c
    Reports exceptional resource usage by command. This flag may be used only on the current day's accounting data.

    -l
    Reports exceptional usage by login ID for the specified date specified in mmdd variable, if other than current day's reporting is desired. (This is lowercase l, as in list.)

    Operands

    yyyymmdd
    Specifies the date for exceptional usage report if other than the current date.

    Description

    This command is called by the crunacct command to format an ASCII report of the previous day's accounting data for all nodes. The report resides in the /var/adm/cacct/sum/rprtyyyymmdd file, where yyyymmdd specifies the year, month, and day of the report.

    cptuning

    Purpose

    cptuning - Copies a file to /tftpboot/tuning.cust.

    Syntax

    cptuning -h | file_name

    Flags

    -h
    Displays usage information for this command (syntax message). If the command is issued with the -h flag, the syntax description is displayed to standard output and no other action is taken (even if other valid flags are entered along with the -h flag).

    Operands

    file_name
    Specifies the name of a file to copy to /tftpboot/tuning.cust. If the file_name begins with a slash (/), the name is considered to be a fully qualified file name. Otherwise, the file name is considered to be in the /usr/lpp/ssp/install/config directory.

    Description

    Use this command to copy the specified file to the /tftpboot/tuning.cust file. IBM ships the following four predefined tuning parameter files in /usr/lpp/ssp/install/config:

    tuning.development
    Contains initial performance tuning parameters for a typical development system.

    tuning.scientific
    Contains initial performance tuning parameters for a typical scientific system.

    tuning.commercial
    Contains initial performance tuning parameters for a typical commercial system.

    tuning.default
    Contains initial performance tuning parameters for a general SP system.

    This command is intended for use in copying one of these files to /tftpboot/tuning.cust on the control workstation for propagation to the nodes in the SP. It can also be used on an individual node to copy one of these files to /tftpboot/ tuning.cust.

    Standard Output

    When the command completes successfully, a message to that effect is written to standard output.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred.

    Output Files

    Upon successful completion, the /tftpboot/tuning.cust file is updated.

    Consequences of Errors

    If the command does not run successfully, it terminates with an error message and a nonzero return code.

    Security

    Use of this command by other than the root user is not restricted. However, this command will fail if the user does not have read permission to the specified file and write permission to the /tftpboot directory.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Location

    /usr/lpp/ssp/bin/cptuning

    Related Information

    SP Files: tuning.commercial, tuning.default, tuning.development, tuning.scientific

    IBM Parallel System Support Programs for AIX: Installation and Migration Guide

    Examples

    1. To copy the /tmp/my-tuning-file file to the /tftpboot/tuning.cust file, enter:
      cptuning /tmp/my-tuning-file
      

    2. To copy the /usr/lpp/ssp/install/config/tuning.commercial file to the /tftpboot/tuning.cust file, enter:
      cptuning tuning.commercial
      

    create_krb_files

    Purpose

    create_krb_files - Creates the necessary krb_srvtab and tftp access files on the Network Installation Management (NIM) master.

    Syntax

    create_krb_files [-h]

    Flags

    -h
    Displays usage information. If the command is issued with the -h flag, the syntax description is displayed to standard output and no other action is taken.

    Operands

    None.

    Description

    Use this command on a boot/install server (including the control workstation). On the server, it creates the Kerberos krb_srvtab file for each boot/install client of that server and also updates the /etc/tftpaccess.ctl file on the server.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    -1
    Indicates that an error occurred.

    Security

    You must have root privilege to run this command.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Location

    /usr/lpp/ssp/bin/create_krb_files

    Related Information

    Commands: setup_server

    Examples

    To create or update the krb_srvtab and tftp access files on a boot/install server, enter the following command on that server:

    create_krb_files
    

    createhsd

    Purpose

    createhsd - Creates one data striping device (HSD) that encompasses two or more IBM Virtual Shared Disks.

    Syntax

    createhsd
    -n {node_list | ALL} -s size_in_MB -g vg_name
     
    -t stripe_size_in_KB [{-c vsds_per_node | -L} [-A]]
     
    [-m mirror_cnt] [-d hsd_name] [-l lv_name_prefix] [-S]
     
    [-o cache | nocache] [-T lp_size_in_MB] [-x]

    Flags

    Note: Some examples shown in this list do not contain enough flags to be executable. They are shown in an incomplete form to illustrate specific flags.

    -n
    Specifies the numbers of the nodes on which you are creating a data striping device (HSD). The backup node for the underlying IBM Virtual Shared Disks cannot be the same as the primary node.

    node_list
    Given in the format P/S:hdisk_list1+hdisk_list2/, where P is the primary node, S, if specified, is the backup (secondary) node, hdisk_list1 is the list of local physical disks in the logical volume on the primary, and hdisk_list1+hdisk_list2 is the list of local physical disks in the volume group on the primary, if you want to have more disks in the volume group than are needed for the logical volume.

    The sequence in which nodes are listed determines the names given to the IBM Virtual Shared Disks; for example:

    createhsd -n 1,6,4 -d DATA
    
    (with the hsd_prefix DATA) creates IBM Virtual Shared Disks DATA1n1 on node 1, DATA2n6 on node 6, and DATA3n4 on node 4, which make up a single HSD named DATA. To create volume groups that span specified disks on nodes 1, 2, and 3 of a system with backup on nodes 4, 5, and 6 of the same system, and that make up a single HSD, enter:
    createhsd -n 1/4:hdisk1,hdisk2,hdisk3/,2/5:hdisk5,hdisk6, \
    hdisk7/,3/6:hdisk2,hdisk4,hdisk6/ -d DATA -s 12 -g OLD -t 4096
    
    This command is shown on two lines here, but you must enter it without any spaces between the items in node_list.

    The command creates:

    If a volume group is already created and the combined physical hdisk lists contain disks that are not needed for the logical volume, those hdisks are added to the volume group. If the volume group has not already been created, createhsd creates a volume group that spans hdisk_list1+hdisk_list2.

    Backup nodes cannot use the same physical disk as the primary does to serve IBM Virtual Shared Disks.

    ALL
    Specifies that you are creating HSDs on all nodes in the system or system partition. If you use ALL, you can't assign backup nodes for the disks.

    -s
    Specifies the total usable size of the HSD in MB. Unless -S is specified, createhsd adds at least a stripe size to each IBM Virtual Shared Disk's size for each HSD.

    -g
    Specifies the Logical Volume Manager (LVM) volume group name, or local volume group name. This name is concatenated with the node number to form the global volume group name (VSD_GVG). For example:
    createhsd -n 6 -g VSDVG
    
    creates a new volume group with the local AIX volume group name VSDVG and the IBM Virtual Shared Disk global volume group name VSDVGn6. The node number is added to the local volume group name to create a unique global volume group name within a system partition to avoid name conflicts with the name used for volume groups on other nodes. If a backup node exists, the global volume group name will be created by concatenating the backup node number as well as the primary node number to the local volume group name. For example:
    createhsd -n 6/3/ -g VSDVG
    
    creates VSDVGn6b3, where the primary node is node 6 and the backup node for this global volume group is node 3. The local AIX volume group name will still be VSDVG. You can specify a local volume group that already exists. You do not need to use the -T flag if you specify a volume group name that already exists.

    -c
    Specifies the number of IBM Virtual Shared Disks to be created on each node. If number_of_vsds_per_node is not specified, one IBM Virtual Shared Disk is created for each node specified on createvsd. If more than one IBM Virtual Shared Disk is to be created for each node, the names will be allocated cyclically. For example:
    createhsd -n 1,6 -c 2 -d DATA
    
    creates IBM Virtual Shared Disks DATA1n1 on node 1, DATA2n6 on node 6, DATA3n1 on node 1, and DATA4n6 on node 6 and uses them to make up the HSD DATA.

    -L
    Allows you to create one IBM Virtual Shared Disk on each node without using sequential numbers for locally-accessed IBM Virtual Shared Disks.

    -A
    Specifies that IBM Virtual Shared Disk names will be allocated to each node in turn. For example:
    createhsd -n 1,6 -c 2 -g DATA
    
    creates DATA1n1 and DATA2n1 on node 1, and DATA3n6 and DATA4n6 on node 6.

    -o
    Specifies either the cache or nocache option for the underlying IBM Virtual Shared Disks. The default is nocache.

    -m
    Specifies the LVM mirroring count. The mirroring count sets the number of physical partitions allocated to each logical partition. The range is from 1 to 3. If -m is not specified, the count is set to 1.

    -t
    Specifies the stripe size in kilobytes that an HSD will use. The stripe size must be a multiple of 4KB and less than or equal to 1GB.

    -d
    Specifies the name assigned to the created HSD. It is used as the IBM Virtual Shared Diskprefix name (the -v in createvsd). If an HSD name is not specified, a default name, xHsD is used, where x denotes a sequence number.

    The command:

    createhsd -n 1,2 -h DATA
    
    creates two IBM Virtual Shared Disks, DATA1n1 and DATA2n2. These IBM Virtual Shared Disks make up one HSD named DATA.

    -l
    Overrides the prefix lvx that is given by default to a logical volume by the createvsd command, where x is the IBM Virtual Shared Disk name prefix specified by vsd_name_prefix or the default (vsd). For example:
    createhsd -n 1 -v DATA
    
    creates one IBM Virtual Shared Disk on node 1 named DATA1n1 with an underlying logical volume lvDATA1n1. If the command
    createhsd -n 1 -v DATA -l new
    
    is used, the IBM Virtual Shared Disk on node 1 is still named DATA1n1, but the underlying logical volume is named lvnew1n1.

    It is usually more helpful not to specify -l, so that your lists of IBM Virtual Shared Disk names and logical volume names are easy to associate with each other and you avoid naming conflicts.

    -S
    Specifies that the HSD overrides the default skip option and does not skip the first stripe to protect the first LVM Control Block (LVCB).

    -T
    Specifies the size of the physical partition in the Logical Volume Manager logical volume group and also the logical partition size (they will be the same) in megabytes. You must select a power of 2 in the range 2--256. The default is 4MB.

    The Logical Volume Manager limits the number of physical partitions to 1016 per disk. If a disk is greater than 4 gigabytes in size, the physical partition size must be greater than 4MB to keep the number of partitions under the limit.

    -x
    Specifies that the steps required to synchronize the underlying IBM Virtual Shared Disks on the primary and secondary nodes should not be performed; that is, the sequence:

    is not done as part of the createvsd processing that underlies the createhsd command. This speeds the operation of the command and avoids unnecessary processing in the case where several IBM Virtual Shared Disks are being created on the same primary/secondary nodes. In that case, however, you need to explicitly issue the volume group commands listed previously.

    Operands

    None.

    Description

    This command utilizes the sysctl facility.

    You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:

    smit createhsd_dialog
    
    or
    smit vsd_data
    
    and Select the Create an HSD option with the vsd_data fastpath.

    Files

    /usr/lpp/csd/bin/createhsd
    Specifies the command file.

    Standard Output

    For the following command:

    createhsd -n 1/:hdisk2,hdisk3/ -g twinVG -s 1600 -t 8 -S -l \
    twinLV -d twinHSD -c 4
    
    The messages returned to standard output are:
    OK:0:vsdvg -g twinVGn1 twinVG 1
    OK:0:defvsd twinLV1n1 twinVGn1 twinHSD1n1 nocache
    OK:0:defvsd twinLV2n1 twinVGn1 twinHSD2n1 nocache
    OK:0:defvsd twinLV3n1 twinVGn1 twinHSD3n1 nocache
    OK:0:defvsd twinLV4n1 twinVGn1 twinHSD4n1 nocache
     
    OK:createvsd { -n 1/:hdisk2,hdisk3/ -s 401 -T 4 -g twinVG
    -c 4 -v twinHSD -l twinLV -o cache -K }
     
    OK:0:defhsd twinHSD not_protect_lvcb 8192 twinHSD1n1 twinHSD2n1
    twinHSD3n1 twinHSD4n1
    

    Exit Values

    0
    Indicates the successful completion of the command.

    -1
    Indicates that an error occurred.

    Security

    You must have sysctl and sysctl.vsd access and authorization from your system administrator to run this command.

    Restrictions

    1. The backup node cannot be the same as the primary node.

    2. The last character of hsd_name cannot be numeric.

    3. The vsd_name_prefix cannot contain the character '.'. See the createvsd -v option for details.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: createvsd, defhsd, vsdvg

    Examples

    To create six 4MB IBM Virtual Shared Disks and their underlying logical volumes with a prefix of TEMP, as well as an HSD comprising those IBM Virtual Shared Disks (24MB overall) with a stripe size of 32KB, enter the following (assuming that no previous IBM Virtual Shared Disks are defined with the TEMP prefix):

    createhsd -n 3,4,7/8/ -c 2 -s 1024 -g vsdvg -d TEMP -t 32
    

    This creates the following IBM Virtual Shared Disks:

    and the HSD:

    Note: TEMP does not write to the first 32KB of each of its IBM Virtual Shared Disks.

    createvsd

    Purpose

    createvsd - Creates a set of IBM Virtual Shared Disks, with their associated logical volumes, and puts information about them into the System Data Repository (SDR).

    Syntax

    Note: Some examples shown in this list do not contain enough flags to be executable. They are shown in an incomplete form to illustrate specific flags.

    createvsd
    -n {node_list | ALL}
     
    -s size_in_MB -g volume_group_name
     
    [{-c number_of_vsds_per_node | -L}]
     
    [-o cache | nocache] [-m mirror_count]
     
    [-p lvm_stripe_size] [-v vsd_name_prefix]
     
    [-l lv_name_prefix] [-T lp_size_in_MB] [-x]

    Flags

    -n
    Specifies the nodes on which you are creating IBM Virtual Shared Disks. The backup node cannot be the same as the primary node.

    node_list
    Given in the format P/S:hdisk_list1+hdisk_list2/, where P is the primary node, S, if specified, is the backup (secondary) node, hdisk_list1 is the list of local physical disks in the logical volume on the primary, and hdisk_list1+hdisk_list2 is the list of local physical disks in the volume group on the primary, if you want to have more disks in the volume group than are needed for the logical volume. The sequence in which nodes are listed determines the names given to the IBM Virtual Shared Disks. For example:
    createvsd -n 1,6,4 -v PRE
    
    (with the vsd_prefix PRE) creates IBM Virtual Shared Disks PRE1n1 on node 1, PRE2n6 on node 6, and PRE3n4 on node 4.

    To create a volume group that spans hdisk2, hdisk3, and hdisk4 on node 1, with a backup on node 3, enter:

    createvsd -n 1/3:hdisk2,hdisk3,hdisk4/ -v DATA
    
    This command creates:

    To create volume groups just like that one on nodes 1, 2, and 3 of a system with backup on nodes 4, 5, and 6 of the same system, enter:

    createvsd -n 1/4:hdisk1,hdisk2,hdisk3/,2/5:hdisk5,hdisk6, \
    hdisk7/,3/6:hdisk2,hdisk4,hdisk6/ -v DATA
    

    This command is shown on two lines here, but you must enter it without any spaces between the items in node_list.

    The command creates:

    To create an IBM Virtual Shared Disk where the logical volume spans only two of the physical disks in the volume group, enter:

    createvsd -n 1/3:hdisk1,hdisk2+hdisk3/ -v DATA
    
    This command creates the IBM Virtual Shared Disk DATA1n1 with logical volume lvDATA1n1 spanning hdisk1 and hdisk2 in the volume group DATA, which includes hdisk1, hdisk2, and hdisk3. It exports the volume group DATA to node 3.

    If a volume group is already created and the combined physical hdisk lists contain disks that are not needed for the logical volume, those hdisks are added to the volume group. If the volume group has not already been created, createvsd creates a volume group that spans hdisk_list1+hdisk_list2.

    Backup nodes cannot use the same physical disk as the primary does to serve IBM Virtual Shared Disks.

    ALL
    Specifies that you are creating IBM Virtual Shared Disks on all nodes in the system or system partition. No backup nodes are assigned if you use this operand. The IBM Virtual Shared Disks will be created on all the physical disks attached to the nodes in node_list (you cannot specify which physical disks to use.)

    -s
    Specifies the size in megabytes of each IBM Virtual Shared Disk

    -g
    Specifies the Logical Volume Manager (LVM) volume group name. This name is concatenated with the node number to produce the global volume group name. For example:
    createvsd -n 6 -g VSDVG
    
    creates a volume group with the local volume group name VSDVG and the global volume group name VSDVG1n6 on node 6. The node number is added to the prefix to avoid name conflicts when a backup node takes over a volume group. If a backup node exists, the global volume group name will be concatenated with the backup node number as well as the primary. For example:
    createvsd -n 6/3/ -g VSDVG
    
    creates a volume group with the local volume group name VSDVG and the global volume group name VSDVGn6b3. The primary node is node 6 and the backup node for this volume group is node 3.

    -c
    Specifies the number of IBM Virtual Shared Disks to be created on each node. If number_of_vsds_per_node is not specified, one IBM Virtual Shared Disk is created for each node specified on createvsd. If more than one IBM Virtual Shared Disk is to be created for each node, the names will be allocated alternately. For example:
    createvsd -n 1,6 -c 2 -v DATA
    
    creates IBM Virtual Shared Disks DATA1n1 on node 1, DATA2n6 on node 6, DATA3n1 on node 1, and DATA4n6 on node 6.

    -L
    Allows you to create one IBM Virtual Shared Diskon each node without using sequential numbers, for locally-accessed IBM Virtual Shared Disks.

    -A
    Specifies that IBM Virtual Shared Disk names will be allocated to each node in turn, for example:
    createvsd -n 1,6 -c 2 -h DATA
    
    creates DATA1n1 and DATA2n1 on node 1, and DATA3n6 and DATA4n6 on node 6.

    -o
    Specifies either the cache or the nocache option. The default is nocache.

    -m
    Specifies the LVM mirroring count. The mirroring count sets the number of physical partitions allocated to each logical partition. The range is from 1 to 3 and the default value is 1.

    -p
    Specifies the LVM stripe size. If this flag is not specified, the logical volumes are not striped. In order to use striping, the node on which the IBM Virtual Shared Disks are defined must have more than one physical disk.

    -v
    Specifies a prefix to be given to the names of the created IBM Virtual Shared Disks. This prefix will be concatenated with the IBM Virtual Shared Disk number, node number, and backup node number, if a backup disk is specified. For example, if the prefix PRE is given to an IBM Virtual Shared Disk created on node 1 and there are already two IBM Virtual Shared Disks with this prefix across the partition, the new IBM Virtual Shared Disk name will be PRE3n1. The name given to the underlying logical volume will be lvPRE3n1, unless the -l flag is used. The createvsd command continues to sequence IBM Virtual Shared Disk names from the last PRE-prefixed IBM Virtual Shared Disk

    If -v is not specified, the prefix vsd is used.
    Note: The last character of the vsd_name_prefix cannot be a digit. Otherwise, the 11th IBM Virtual Shared Disk with the prefix PRE would have the same name as the first IBM Virtual Shared Disk with the prefix PRE1. Nor can the vsd_name_prefix contain the character '.' because '.' can be any character in regular expressions.

    -l
    Overrides the prefix lvx that is given by default to a logical volume by the createvsd command, where x is the IBM Virtual Shared Disk name prefix specified by vsd_name_prefix or the default (vsd). For example:
    createvsd -n 1 -v DATA
    
    creates one IBM Virtual Shared Disk on node 1 named DATA1n1 with an underlying logical volume lvDATA1n1. If the command
    createvsd -n 1 -v DATA -l new
    
    is used, the IBM Virtual Shared Disk on node 1 is still named DATA1n1, but the underlying logical volume is named lvnew1n1.

    It is usually more helpful not to specify -l, so that your lists of IBM Virtual Shared Disk names and logical volume names are easy to associate with each other and you avoid naming conflicts.

    -T
    Specifies the size of the physical partition in the Logical Volume Manager logical volume group and also the logical partition size (they will be the same) in megabytes. You must select a power of 2 in the range 2--256. The default is 4MB.

    The Logical Volume Manager limits the number of physical partitions to 1016 per disk. If a disk is greater than 4 gigabytes in size, the physical partition size must be greater than 4MB to keep the number of partitions under the limit.

    -x
    Specifies that the steps required to synchronize the IBM Virtual Shared Disks on the primary and secondary nodes should not be performed; that is, the sequence:

    is not done as part of the createvsd processing. This speeds the operation of the command and avoids unnecessary processing in the case where several IBM Virtual Shared Disks are being created on the same primary/secondary nodes. In this case, however, you should either not specify -x on the last createvsd in the sequence or issue the volume group commands listed above explicitly.

    Operands

    None.

    Description

    Use this command to create a volume group with the specified name (if one does not already exist) and creates a logical volume of size s within that volume group.

    You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:

    smit vsd_data
    
    and select the Create an IBM Virtual Shared Disk option.

    Files

    /usr/lpp/csd/bin/createvsd
    Specifies the command file.

    Standard Output

    For the following command:

    createvsd -n 1/:hdisk1/ -g testvg -s 16 -T 8 -l lvtest -v test -c 4
    
    The messages returned to standard output are:
    OK:0:vsdvg -g testvgn1 testvg 1
    OK:0:defvsd lvtest1n1 testvgn1 test1n1 nocache
    OK:0:defvsd lvtest2n1 testvgn1 test2n1 nocache
    OK:0:defvsd lvtest3n1 testvgn1 test3n1 nocache
    OK:0:defvsd lvtest4n1 testvgn1 test4n1 nocache
    

    For the following command:

    createvsd -n 1/:hdisk1/ -g testvg -s 16 -T 8 -l lvtest -v test -c 4
    
    The messages returned to standard output are:
    OK:0:defvsd lvtest5n1 testvgn1 test5n1 nocache
    OK:0:defvsd lvtest6n1 testvgn1 test6n1 nocache
    OK:0:defvsd lvtest7n1 testvgn1 test7n1 nocache
    OK:0:defvsd lvtest8n1 testvgn1 test8n1 nocache
    

    Exit Values

    0
    Indicates the successful completion of the command.

    -1
    Indicates that an error occurred.

    Security

    You must have sysctl and sysctl.vsd access and authorization from your system administrator to run this command.

    Restrictions

    1. The backup node cannot be the same as the primary node.

    2. The last character of vsd_name_prefix cannot be numeric.

    3. The vsd_name_prefix cannot contain the character '.'.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: defvsd, vsdvg

    Examples

    To create two 4MB IBM Virtual Shared Disks on each of three primary nodes, one of which has a backup, enter:

    createvsd -n 3,4,7/8/ -c 2 -s 4 -g vsdvg -v TEMP
    
    This command creates the following IBM Virtual Shared Disks:

    To create three IBM Virtual Shared Disks, where the logical volume created on node 3 spans fewer disks than the volume group does, enter:

    createvsd -n 3,4/:hdisk1,hdisk2+hdisk3/,7/8/ -s 4 -g datavg -v USER
    
    This command creates:

    crunacct

    Purpose

    crunacct - Runs on the acct_master node to produce daily summary accounting reports and to accumulate accounting data for the fiscal period using merged accounting data from each node.

    Syntax

    crunacct
    yyyymmdd
     
    [SETUP | DELNODEDATA | MERGETACCT | CMS | USEREXIT | CLEANUP]

    Flags

    SETUP
    Copies the files produced by nrunacct on each node to the acct_master node. For each node named by the string node, these files:
    /var/adm/acct/nite/lineuseYYYYMMDD
    /var/adm/acct/nite/rebootsYYYYMMDD
    /var/adm/acct/nite/daytacctYYYYMMDD
    /var/adm/acct/sum/daycmsYYYYMMDD
    /var/adm/acct/sum/loginlogYYYYMMDD

    are copied to the acct_master node to the following files:

    /var/adm/cacct/node/nite/lineuseYYYYMMDD
    /var/adm/cacct/node/nite/rebootsYYYYMMDD
    /var/adm/cacct/node/nite/daytacctYYYYMMDD
    /var/adm/cacct/node/sum/daycmsYYYYMMDD
    /var/adm/cacct/node/sum/loginlogYYYYMMDD

    for all YYYYMMDD prior or equal to the YYYYMMDD being processed.

    DELNODEDATA
    Deletes files that have been copied to the acct_master node in the SETUP step, as well as the associated /var/adm/acct/statefileYYYYMMDD files.

    MERGETACCT
    Produces a daily total accounting file and merges this daily file into the total accounting file for the fiscal period, for each accounting class. If there are no defined accounting classes, the output of this step represents data for the entire SP system.

    CMS
    Produces a daily command summary file and merges this daily file into the total command summary file for the fiscal period, for each accounting class. If there are no defined accounting classes, the output of this step represents data for the entire SP system.

    It also creates an SP system version of the loginlog file, in which each line consists of a date, a user login name and a list of node names. The date is the date of the last accounting cycle during which the user, indicated by the associated login name, had at least one connect session in the SP system. The associated list of node names indicates the nodes on which the user had a login session during that accounting cycle.

    USEREXIT
    If the /var/adm/csiteacct shell file exists, calls it to perform site specific accounting procedures that are applicable to the acct_master node.

    CLEANUP
    Prints a daily report of accounting activity and removes files that are no longer needed.

    Operands

    yyyymmdd
    Specifies the date for accounting to be rerun.

    Description

    In order for SP accounting to succeed each day, the nrunacct command must complete successfully on each node for which accounting is enabled and then the crunacct command must complete successfully on the acct_master node. However, this may not always be true. In particular, the following scenarios must be taken into account:

    1. The nrunacct command does not complete successfully on some nodes for the current accounting cycle. This can be the result of an error during the execution of nrunacct, nrunacct not being executed at the proper time by cron or the node being down when nrunacct was scheduled to run.

    2. The acct_master node is down or the crunacct command cannot be executed.

    From the point of view of the crunacct command, the first scenario results in no accounting data being available from a node. The second scenario results in more than one day's accounting data being available from a node. If it is the case that no accounting data is available from a node, the policy of crunacct is that the error condition is reported and processing continues with data from the other nodes. If data cannot be obtained from at least X percent of nodes, then processing is terminated. "X" is referred to as the spacct_actnode_thresh attribute and can be set via a SMIT panel.

    If node data for accounting cycle N is not available when crunacct executes and then becomes available to crunacct during accounting cycle N+1, the node data for both the N and N+1 accounting cycles is merged by crunacct. In general, crunacct merges all data from a node that has not yet been reported into the current accounting cycle, except as in the following case.

    If it is the case that crunacct has not run for more than one accounting cycle, such that there are several day's data on each node, then the policy of crunacct is that it processes each accounting cycle's data to produce the normal output for each accounting cycle. For example, if crunacct has not executed for accounting cycles N and N+1, and it is now accounting cycle N+2, then crunacct first executes for accounting cycle N, then executes for accounting cycle N+1 and finally executes for accounting cycle N+2.

    However, if the several accounting cycles span from the previous fiscal period to the current fiscal period, then only the accounting cycles that are part of the previous fiscal period are processed. The accounting cycles that are part of the current fiscal period are processed during the next night's execution of crunacct. Appropriate messages are provided in the /var/adm/cacct/active file so that the administrator can execute cmonacct prior to the next night's execution of crunacct.

    Restart Procedure

    To restart the crunacct command after a failure, first check the /var/adm/cacct/activeYYYYMMDD file for diagnostic messages, and take appropriate actions. For example, if the log indicates that data was unavailable from a majority of nodes, and their corresponding nrunacct state file indicate a state other than complete, check their /var/adm/acct/nite/activeYYYYMMDD files for diagnostic messages and then fix any damaged data files, such as pacct or wtmp.

    Remove the lock files and lastdate file (all in the /var/adm/cacct directory), before restarting the crunacct command. You must specify the -r and YYYYMMDD parameters if you are restarting the crunacct command. It specifies the date for which the crunacct command is to rerun accounting. The crunacct procedure determines the entry point for processing by reading the /var/adm/cacct/statefileYYYYMMDD file. To override this default action, specify the desired state on the crunacct command line.

    Files

    /var/adm/cacct/activeYYYYMMDD
    The crunacct message file.

    /var/adm/cacct/fiscal_periods
    Customer-defined file indicating start date of each fiscal period.

    /var/adm/cacct/lastcycle
    Contains last successful crunacct completed cycle.

    /var/adm/cacct/lock*
    Prevents simultaneous invocation of crunacct.

    /var/adm/cacct/lastdate
    Contains last date crunacct was run.

    /var/adm/cacct/nite/statefileYYYYMMDD
    Contains current state to process.

    Security

    Access Control: This command should grant execute (x) access only to members of the adm group.

    Prerequisite Information

    For more information about the Accounting System, the preparation of daily and monthly reports, and the accounting files, see IBM Parallel System Support Programs for AIX: Administration Guide.

    Related Information

    Commands: acctcms, acctcom, acctcon1, acctcon2, acctmerg, acctprc1, acctprc2, accton, crontab, fwtmp, nrunacct

    Daemon: cron

    The System Accounting information found in AIX Version 4.1 System Management Guide

    Examples

    1. To restart the SP system accounting procedures for a specific date, enter a command similar to the following:
      nohup /usr/lpp/ssp/bin/crunacct -r 19940601 2>> \
      /var/adm/cacct/nite/accterr &
      
      This example restarts crunacct for the day of June 1 (0601), 1994. The crunacct command reads the file /var/adm/cacct/statefile19940601 to find out the state with which to begin. The crunacct command runs in the background (&), ignoring all INTERRUPT and QUIT signals (nohup). Standard error output (2) is added to the end (>>) of the /var/adm/cacct/nite/accterr file.

    2. To restart the SP system accounting procedures for a particular date at a specific state, enter a command similar to the following:
      nohup /usr/lpp/ssp/bin/crunacct 19940601 CMS 2>> \
      /var/adm/cacct/nite/accterr &
      
      This example restarts the crunacct command for the day of June 1 (0601), 1994, starting with the CMS state. The crunacct command runs in the background (&), ignoring all INTERRUPT and QUIT signals (the nohup command). Standard error output (2) is added to the end (>>) of the /var/adm/cacct/nite/accterr file.

    cshutdown

    Purpose

    cshutdown - Specifies the SP system Shutdown command.

    Syntax

    cshutdown
    [-G] [-E] [-N | -P | -R | -s | -g] [-W seconds ] [-X] [-Y]
     
    [-F] [-h | -k | -K | -m | -r [ -C  cstartup_options]]
     
    [-T time [-M message_string]]
     
    target_nodes

    Flags

    -C cstartup_options
    Tells cshutdown to pass the cstartup_options to cstartup when the cstartup command is invoked after the target_nodes are halted. This flag is valid only when the -r (reboot) option is also specified. Any blanks in cstartup_options must be escaped or quoted.

    -E
    Terminates processing if any nodes are found that are powered on, but not running (host_responds in the System Data Repository (SDR) shows a value of 0--node shows red on the system monitor). This includes nodes that may have been placed in maintenance (single-user) mode. Refer to the "Description" section for additional information.

    If you specify -E, you cannot specify -X.

    -G
    Allows the specification of nodes to include one or more nodes outside the current system partition. If ALL is specified with -G, all nodes in the SP are shut down. If ALL is specified without -G, all nodes in the current system partition are shut down. If -G is specified with a list of nodes, all listed nodes are shut down regardless of the system partition in which they reside (subject to the restrictions of the sequence file). If -G is not specified and some of the specified target nodes are outside of the current system partition or some of the specified target nodes depend on nodes outside of the current system partition, none of the specified nodes are shut down.

    -g
    Indicates that the target_nodes are specified as a named node group. If -G is supplied, a global node group is used. Otherwise, a partitioned-bound node group is used.

    -F
    Tells the cshutdown command to start the shut down immediately, without issuing warning messages to users.

    -h
    Halts the target nodes. This is the default, unless overridden by the -k, -m, or -r flags.

    -k
    Verifies the shutdown sequence file without shutting any node down. Special subsystems are not affected. There is no effect on a nonrunning target node. You can use cshutdown -kF ALL to test your /etc/cshutSeq file without actually shutting down any nodes and without sending messages to users.

    -K
    Limits the number of concurrent processes created to rsh to the nodes. This is relevant to large systems. The default value is 64.

    -m
    Handles the request similar to a halt except that the last step, after syncing and unmounting file systems, is to bring the node to single user mode. There is no effect on a nonrunning target node.

    -N
    Indicates that the target_nodes are specified as node numbers, not en0 host names. The node numbers can be specified as ranges, for example, 3-7 indicates nodes 3, 4, 5, 6, and 7.

    -P
    Powers off the nodes after the shutdown command completes. This is the default action except when the -m option (single user mode) is chosen.

    -r
    Handles the request as a reboot. It performs the same operations as -h. Then it restarts the target nodes with cstartup. It does not power on a target node that was powered off at the time the cshutdown command was issued (it differs from the cstartup command, which powers on all specified nodes).

    -R
    Indicates that target_nodes is a file that contains host identifiers. If you also use the -N flag, the file contains node numbers; otherwise, the file contains node names, specified as en0 host names.

    -s
    Kills nonroot processes in the node order specified in /etc/cshutSeq. The default is to kill the nonroot processes in parallel.

    [-T time [-M message_string]]
    The -T flag specifies a time to start cshutdown, either as a number of minutes from now (-T minutes) or at the time in 24-hour format (-T hh:mm). If the -T flag is specified, then you can use -T message_string to specify a message for users on the target nodes. Any blanks in message_string must be escaped or quoted.

    -W seconds
    Provides a time-out value for shutting down a leading node. In normal processing, cshutdown waits for a leading node to be completely halted before starting to shut down trailing nodes. If one or more leading nodes fails to shut down, the cshutdown command waits indefinitely. The -W flag tells cshutdown to wait only the specified number of seconds after starting to halt a leading node; after that time, cshutdown starts the halt process for the trailing nodes.

    Notes:

    1. Be careful to use time-out values large enough to allow a node to complete shutdown processing. Your time-out value should be at least several minutes long; shorter values may be transparently modified to a higher value.

    2. If shutdown processing for a node does not complete within the time-out limit and cshutdown halts trailing nodes, the system may not function correctly.

    If there are special subsystems, the same waiting procedure applies to subsystem sequencing in the subsystem phase.

    -X
    Tells cshutdown that the state of nontarget nodes should not affect the result of the command. Use the -X flag to force cshutdown to shut down the target nodes if nontarget nodes listed in /etc/cshutSeq are gating the shutdown.
    Note: If some critical nodes, but not the entire system, are forced to halt or reboot, the system may not function correctly.

    -Y
    Tells cshutdown to ignore any failure codes from the special subsystem interfaces. Without this flag, if a special subsystem interface exits with a failure code, you receive a prompt allowing you to continue the operation, to quit, or to enter a subshell to investigate the failure. On return from the subshell, you are prompted with the same choices.

    Operands

    target_nodes
    Designates the target nodes to be operated on. It is the operand of the command, and must be the last token on the command line. In the absence of the -R, -N, or -g flags, target_nodes are specified as host names on the en0 Ethernet. Use ALL to designate the entire system. You must identify one or more target_nodes.

    Description

    Use this command to halt or reboot the entire system or any number of nodes in the system. The SP cshutdown command is analogous to the workstation shutdown command. Refer to the shutdown man page for a description of the shutdown command. The cshutdown command always powers off the nodes except while in Maintenance mode.
    Note: If you bring a node down to maintenance mode, you must ensure file system integrity before rebooting the node.

    In this case, the cshutdown command, which runs from the control workstation, cannot rsh to the node to perform the node shutdown phase processing. This includes the synchronization of the file systems. Therefore, you should issue the sync command three times in succession from the node console before running the cshutdown command. This is especially important if any files were created while the node was in maintenance mode.

    To determine which nodes may be affected, issue the spmon -d command and look for a combination of power on and host_responds no.

    The cshutdown command has these advantages over using the shutdown command to shut down each node of an SP:

    Shutdown processing has these phases:

    1. Notifying all users of the impending shutdown, then terminating all nonroot processes on the target nodes. Nonroot processes are sent a SIGTERM followed, 30 seconds later, by a SIGKILL. This gives user processes that handle SIGTERM a chance to do whatever cleanup is necessary.

    2. Invoking any special subsystems, so they can perform any necessary shutdown activities. This phase follows the sequencing rules in /etc/subsysSeq. See IBM Parallel System Support Programs for AIX: Administration Guide for the format of the /etc/subsysSeq file.

    3. Starting node phase shutdown. The node phase includes syncing and unmounting file systems and halting the nodes, following the sequencing rules in /etc/cshutSeq. See IBM Parallel System Support Programs for AIX: Administration Guide for the format of the /etc/cshutSeq file.

    4. Rebooting the system, if requested by the -r flag.

    Files

    The following files reside on the control workstation:

    /usr/lpp/ssp/bin/cshutdown
    The cshutdown command.

    /etc/cshutSeq
    Describes the sequence in which the nodes should be shut down. Nodes not listed in the file are shut down concurrently with listed nodes. If the file is empty, all nodes are shut down concurrently. If the file does not exist, cshutdown uses the output of seqfile as a temporary sequencing default.

    /etc/subsysSeq
    Describes groups of special subsystems that need to be invoked in the subsystem phase of cshutdown. Also shows the sequence of invocation. Subsystems are represented by their invocation commands. If this file does not exist or is empty, no subsystem invocation is performed.

    /var/adm/SPlogs/cs/cshut.MMDDhhmmss.pid
    Road map of cshutdown command progress.

    Restrictions

    The cshutdown command can only be issued on the control workstation by root or members of the shutdown group. The root user must issue the kinit command, specifying a principal name for which there is an entry in the hardmon ACLs file with control authorization for the frames to shut down. The hardmon and System Data Repository (SDR) must be running.

    Results

    The cshutdown command may be gated by the failure of some subsystems or nodes to complete shutdown. In this case, look in the file created: /var/adm/SPlogs/cs/cshut.MMDDhhmmss.pid

    MMDDhhmmss
    Time stamp.

    pid
    The process ID of the cshutdown command.

    If a file with the same name already exists (from a previous year), the cshutdown command overwrites the existing file.

    Related Information

    Commands: cstartup, init, seqfile, shutdown

    Examples

    1. For these examples, assume that /etc/cshutSeq contains the following lines:
      Group1 > Group2 > Group3
      Group1: A
      Group2: B
      Group3: C
      

      This defines 3 groups, Group1 through Group3, each containing a single node. The nodes names are A, B, and C. The sequence line Group1 > Group2 > Group3 means that Group3 (node C) is shut down first. When Group3 is down, Group2 (node B) is shut down. When Group2 is down, then Group1 (node A) is shut down.

      Table 1 shows that the result of a cshutdown command depends on the flags specified on the command line, the initial state of each node, and the sequencing rules in /etc/cshutSeq. The shorthand notation Aup indicates that node A is up and running; Adnindicates that node A is down.

      Table 1. Examples of the cshutdown Command
      up means the node is powered up and running;
      The subscript the subscript dn means the node is not running.
      Initial State Command Issued Final State Explanation
      Aup Bup Cup cshutdown A B C Adn Bdn Cdn The command succeeds; the nodes are all down.
      Aup Bup Cdn cshutdown B Aup Bdn Cdn The command succeeds because C is already not running.
      Aup Bup Cdn cshutdown A Unchanged The command fails because B is still running.
      Aup Bup Cdn cshutdown -X A Adn Bup Cdn The command succeeds because -X considers the sequencing of only the target nodes.

    2. To shut down all the nodes in the SP system regardless of system partitions and the sequence file, enter:
      cshutdown -GXY ALL
      

    3. To shut down nodes 1, 9, and 16--20 regardless of system partitions and subject to the restrictions of the sequence file, enter:
      cshutdown -G -N 1 9 16-20
      

      The command may fail if any node in the list depends on any node that is not on the list and that node is not shutdown.

    4. To shut down all the nodes in the current system partition, enter:
      cshutdown ALL
      

      The command may fail if any node in the current system partition depends on nodes outside of the current system partition.

    5. To shut down nodes 1, 5, and 6 in the current system partition, enter:
      cshutdown -N 1 5 6
      

      The command may fail if any node in the list is not in the current system partition or depends on nodes outside of the current system partition.

    6. Specify the -X flag to ignore the sequence file and force nodes 1, 5, and 6 to be shut down. The following command is successful even if node 5 is gated by a node that is not shut down or is outside the current system partition:
      cshutdown -X -N 1 5 6
      

    7. To do a fast shut down on node 5 without sending a warning message to the user, enter:
      cshutdown -F -N 5
      

    8. To verify the sequence file without shutting down any node, enter the -k flag as follows. If both the -k and -F flags are specified, the sequence file can be tested without actually shutting down any nodes and without issuing a warning message to the user.
      cshutdown -kF ALL
      

    9. Specify the -r flag to halt the target nodes and restart them with cstartup. If necessary, specify the -C flag to provide cstartup_options. For example, to halt and restart nodes 12--16 with a time-out value of 300 seconds for the purpose of starting a leading node, enter:
      cshutdown -rN -C'-W 300' 12-16
      

    10. To reboot all the nodes in the partition node group sleepy_nodes, enter:
      cshutdown -rg sleepy_nodes
      

    CSS_test

    Purpose

    CSS_test - Verifies that the installation and configuration of the Communications Subsystem of the SP system completed successfully.

    Syntax

    CSS_test

    Flags

    None.

    Operands

    None.

    Description

    Use this command to verify that the Communications Subsystem component ssp.css of the SP system was correctly installed. CSS_test runs on the system partition set in SP_NAME.

    A return code of 0 indicates that the test completed without a failure, but unexpected results may be noted on standard output and in the companion log file /var/adm/SPlogs/CSS_test.log. A return code of 1 indicates that a failure occurred.

    You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:

    smit SP_verify
    

    Files

    /usr/lpp/ssp/bin/CSS_test
    Path name of this command

    /var/adm/SPlogs/CSS_test.log
    Default log file

    Related Information

    Commands: jm_install_verify, jm_verify, SDR_test, SYSMAN_test, spmon_ctest, spmon_itest

    Examples

    To verify the Communication Subsystem following installation, enter:

    CSS_test
    

    cstartup

    Purpose

    cstartup - Specifies the SP system Startup command.

    Syntax

    cstartup
    [-E] [-G] [-k] [-N | -R | -g] [-S] [-W seconds] [-X]
     
    [-Z] [-z] {target_nodes | [ALL]}

    Flags

    -E
    Starts up all nodes concurrently. Ignores the /etc/cstartSeq file, if one exists.

    -G
    Allows the specification of nodes to include one or more nodes outside of the current system partition. If ALL is specified with -G, all nodes in the SP start up. If ALL is specified without -G, all nodes in the current system partition start up. If -G is specified with a list of nodes, all listed nodes start up regardless of the system partition in which they reside (subject to the restrictions of the sequence file). If -G is not specified and some of the specified target nodes are outside of the current system partition or some of the specified target nodes depend on nodes outside of the current system partition, none of the specified nodes are started up.

    -g
    Indicates that the target_nodes are specified as a named node group. If -G is supplied, a global node group is used. Otherwise, a partitioned-bound node group is used.

    -k
    Checks the sequence data file; does not start up any nodes. If circular sequencing is detected, cstartup issues warning messages. You can use cstartup -k ALL to test your /etc/cstartSeq file without starting or resetting any nodes.

    -N
    Indicates that the target_nodes are specified as node numbers, not en0 host names. The node numbers can be specified as ranges; for example, 3-7 is interpreted as nodes 3, 4, 5, 6, and 7.

    -R
    Indicates that target_nodes is a file that contains the node identifiers.

    -S
    Tells cstartup to ignore existing sequencing violations; some trailing target_nodes are already up and running. The target_nodes that are already up are left alone. The other target_nodes are started in sequence. This operation may cause the nodes involved to not interface properly with their dependent nodes. If you omit the -S flag and any target_node is already running before its leading node, cstartup fails without modifying the state of the system.

    -W seconds
    Provides a timeout value for starting up a leading node. In normal processing, cstartup waits for a leading node to be completely started before initiating the startup of trailing nodes. If one or more target_nodes fail to come up, cstartup waits indefinitely. The -W flag tells cstartup to wait the specified amount of time after initiating the startup of a node; the command continues to start other nodes, preserving the sequence in /etc/cstartSeq. The value you specify as seconds is added to a 3 minute (180 second) default wait period. Your value is a minimum; internal processing may cause the actual wait time to be slightly longer.
    Note: Your system may still be usable if one or more nodes fails to complete startup, because the sequencing rules are preserved.

    -X
    Starts up only the nodes listed on the command line even if there are nontarget nodes gating the system startup. If you do not specify the -X flag and there are sequence violations involving nontarget nodes, cstartup fails without modifying the state of the system.
    Note: If some nodes but not the entire system are forced to start up this way, they may not function properly because of possible resource problems.

    -Z
    If a target_node is already running at the time the cstartup command is issued, this flag tells cstartup to reset the node. This operation is disruptive to any processes running on the node. If you omit the -Z flag and any target_node is already running, cstartup fails without modifying the state of the system.

    -z
    If a target_node is already running at the time the cstartup command is issued, this flag tells cstartup to reset the node if the node is dependent on a node that is down when cstartup is issued, but leave the node alone if the node is to be started up ahead of any down node. This operation is disruptive to any processes running on the node being reset. This operation correctly resets the node-startup sequencing with minimum disruption to the system. If you omit the -z flag and any target_node is already running, cstartup fails without modifying the state of the system.

    Operands

    target_nodes
    Designates the target nodes to be operated on. It is the operand of the command, and must be the last token on the command line. In the absence of the -R, -N, or -g flags, target_nodes are specified as host names on the en0 Ethernet. The string ALL can be used to designate all nodes in the SP system. You must identify one or more target_nodes.

    Description

    Caution!

    The cstartup command attempts to power on nodes that are powered off. This has safety implications if someone is working on the nodes. Proper precautions should be taken when using this command.

    The cstartup command starts up the entire system or any number of nodes in the system. If a node is not powered on, startup means powering on the node. If the node is already powered on and not running, startup means resetting the node.

    The /etc/cstartSeq file specifies the sequence in which the nodes are started up. See IBM Parallel System Support Programs for AIX: Administration Guide for the format of the /etc/cstartSeq file.

    You can use the -SXZ flags to violate the cstartup sequence intentionally. See Table 2 for examples of the effect of these flags.

    Files

    The following files reside on the control workstation:

    /usr/lpp/ssp/bin/cstartup
    The cstartup command.

    /etc/cstartSeq
    Describes the sequence in which the nodes should be started. Nodes not listed in the file are started up concurrently with listed nodes. If the file is empty, all nodes are started up concurrently. If the file does not exist, cstartup uses the output of seqfile as a temporary sequencing default.

    /var/adm/SPlogs/cs/cstart.MMDDhhmmss.pid
    Road map of cstartup command progress.

    Restrictions

    The cstartup command can only be issued on the control workstation by root or members of the shutdown group. The root user must issue the kinit command, specifying a principal name for which there is an entry in the hardmon ACLs file with control authorization for the frames to start up. The hardmon and System Data Repository (SDR) must be running.

    Results

    The /var/adm/SPlogs/cs/cstart.MMDDhhmmss.pid file contains the results of cstartup.

    MMDDhhmmss
    The time stamp.

    pid
    The process ID of the cstartup command.

    If the command fails, examine this file to see which steps were completed. If a file with the same name already exists (from a previous year), the cstartup command overwrites the existing file.

    Related Information

    Commands: cshutdown, init, seqfile

    Examples

    1. For these examples, assume that /etc/cstartSeq specifies the following startup sequence:
      Group1 > Group2 > Group3 > Group4 > Group5
                Group1: A
                Group2: B
                Group3: C
                Group4: D
                Group5: E
      

      This defines five groups, Group1 through Group5, each containing a single node. The nodes names are A, B, C, D, and E. The sequence line Group1 > Group2 > Group3 > Group4 > Group5 means that Group1 (node A) is started first. When Group1 is up, Group2 (node B) is started. When Group2 is up, then Group3 (node C) is started, and so on.

      Table 2 shows that the result of a cstartup command depends on the flags specified on the command line, the initial state of each node, and the sequencing rules in /etc/cstartSeq. The shorthand notation Aup indicates that A is powered up and running; Adnindicates that A is not running.

      Table 2. Examples of the cstartup Command
      up means the node is up; the subscript
      The subscript dn means the node is down.
      Initial State Command Issued Final State Explanation
      Adn Bdn Cdn Ddn Edn cstartup A B C D E Aup Bup Cup Dup Eup The command succeeds; the nodes are all up.
      Aup Bup Cdn Ddn Edn cstartup A B C D E Aup Bup Cup Dup Eup The command succeeds, C, D, and E are started up.
      Aup Bup Cdn Dup Edn cstartup A B C D E Unchanged The command fails because D was already up before C.
      Aup Bup Cdn Dup Edn cstartup -S A B C D E Aup Bup Cup Dup Eup The command succeeds because -S ignores sequencing violations.
      Aup Bup Cdn Dup Edn cstartup -Z A B C D E Aup Bup Cup Dup Eup The command succeeds because -Z resets running nodes.
      Aup Bup Cdn Dup Edn cstartup C E Unchanged The command fails because node D was already up before node C.
      Aup Bup Cdn Dup Edn cstartup -S C E Aup Bup Cup Dup Eup The command succeeds because -S ignores sequencing violations.
      Aup Bup Cdn Dup Edn cstartup -X C E Aup Bup Cup Dup Eup The command succeeds because -X considers the sequencing of only the target nodes.
      Aup Bup Cdn Dup Edn cstartup -Z C E unchanged The command fails because resetting C or E does not correct the sequence violation.
      Aup Bup Cdn Ddn Edn cstartup C E unchanged The command fails because D is gating E. Node C is not started either.
      Aup Bup Cdn Ddn Edn cstartup -S C E unchanged The command fails because D is gating E. Node C is not started either.
      Aup Bup Cdn Ddn Edn cstartup -X C E Aup Bup Cup Ddn Eup The command succeeds and starts up only the explicit targets, C and E.
      Aup Bup Cdn Ddn Edn cstartup -Z C E unchanged The command fails because D is gating E. Node C is not started either.

    2. To start up all the nodes in the SP system regardless of system partitions and the sequence file, enter:
      cstartup -GXZ ALL
      

    3. To start up nodes 1, 9, and 16--20 regardless of system partitions and subject to the restrictions of the sequence file, enter:
      cstartup -G -N 1 9 16-20
      

      The command may fail if any node in the list depends on any node that is not on the list and that node is not started up.

    4. To start up all the nodes in the current system partition, enter:
      cstartup ALL
      

      The command may fail if any node in the current system partition depends on nodes outside of the current system partition.

    5. To start up nodes 1, 5, and 6 in the current system partition, enter:
      cstartup -N 1 5 6
      

      The command may fail if any node in the list is not in the current system partition or depends on nodes outside of the current system partition.

    6. Specify the -X flag to ignore the sequence file and force nodes 1, 5, and 6 to be started up. The following command is successful even if node 5 is gated by a node that is not started up or is outside the current system partition:
      cstartup -X -N 1 5 6
      

    7. To verify the sequence file without actually starting up or resetting any nodes, enter the -k flag as follows:
      cstartup -k ALL
      

    8. To ignore the sequence file and start up all the target nodes concurrently, use the -E flag. For example, to start up all the nodes in the current system partition concurrently, enter:
      cstartup -E ALL
      

    9. To start up all nodes in the system node group sleepy_nodes, enter:
      cstartup -Gg sleepy_nodes
      

    ctlhsd

    Purpose

    ctlhsd - Sets the data striping device (HSD) for the IBM Virtual Shared Disks operational parameters or reset statistics.

    Syntax

    ctlhsd [-p parallel_level | -v hsd_name ... | -C | -V]

    Flags

    no option
    Displays the current parallelism level, the number of reworked requests, and the number of requests that are not at a page boundary.

    -p parallel_level
    Sets the HSD device driver's parallelism level as the specified value of the parallel_level.

    -v hsd_name ...
    Resets the statistics in the number of reads and writes on the specified HSDs.

    -C
    Resets the HSD device drivers counters in the number of reworked requests and the number of read/write requests that are not at a page boundary.

    -V
    Resets all the configured HSD's statistics in the number of read and write requests.

    Operands

    None.

    Description

    Use this command to set the parallelism level and to reset the statistics of the data striping device (HSD) for the IBM Virtual Shared Disk. When specified with no arguments, it displays the the current parallelism level, the number of reworked requests, and the number of requests that were not at a page boundary. When ctlhsd is used to reset the statistics of the device driver, or a particular device, or all the configured data striping devices on the system, it will not suspend all the underlying IBM Virtual Shared Disks. In other words, the user should make sure that there are no I/O activities on the IBM Virtual Shared Disks.

    Use lshsd -s to display the statistics on the number of read and write requests at the underlying IBM Virtual Shared Disks in an HSD or all HSDs. Use the -v or -V flag to reset these counters.

    Files

    /usr/lpp/csd/bin/ctlhsd
    Specifies the command file.

    Security

    You must have root privilege to run this command.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: cfghsd, lshsd, lsvsd, resumevsd, suspendvsd, ucfghsd

    Examples

    To display the current parallelism level and counter, enter:

    ctlhsd
    
    The system displays a message similar to the following:
    The current parallelism level is 9.
    The number of READ requests not at page boundary is 0.
    The number of WRITE requests not at page boundary is 0.
    

    ctlvsd

    Purpose

    ctlvsd - Sets IBM Virtual Shared Disk operational parameters.

    Syntax

    ctlvsd
    [-t] | [-T] | [-c NewCacheSize | -r node_number... | -R |
     
    -k node_number... -K | -p 1--9 | -M max_IP_msg_size |
     
    -v vsd_name ... | -C | -V]

    Flags

    -t
    Lists the current routing table and mbuf headers cached by the IBM Virtual Shared Disk driver.

    -T
    Clears or releases all cached routes.

    -c
    Sets the cache size to the new value. Only increasing the cache size up to the maximum value is supported. The initial value of the cache size is the init_cache_buffer_count from the SDR Node object for the node.

    -r
    Resets the outgoing and expected sequence numbers for the nodes specified on the node on which the command is run. Use this flag when another node has either been rebooted, cast out or all IBM Virtual Shared Disks have been reconfigured on that node. The specified nodes are also cast in.

    -R
    Resets the outgoing and expected sequence number for all nodes on the node on which the command is run. Use this flag after rebooting the node. All nodes in the IBM Virtual Shared Disk network will be cast in.

    -k
    Casts out the node numbers specified on the local node. The local node ignores requests from cast out nodes. Use -r to cast nodes back in.
    Note: Before using this flag, refer to the "Restrictions" section that follows.

    -K
    Casts out all nodes on the local node. Local requests are still honored.
    Note: Before using this flag, refer to the "Restrictions" section that follows.

    -p
    Sets the level of IBM Virtual Shared Disk parallelism to the number specified. The valid range is 1 to 9. The default is 9. A larger value can potentially give better response time to large requests. (Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for more information regarding tuning IBM Virtual Shared Disk performance.)

    This value is the buf_cnt parameter on the uphysio call the IBM Virtual Shared Disk IP device driver makes in the kernel. Use statvsd to display the current value on the node on which the command is run.

    -M
    Sets the IBM Virtual Shared Disk max_IP_msg_size. This is the largest sized block of data the IBM Virtual Shared Disk sends over the network for an I/O request. This limit also affects local IBM Virtual Shared Disk I/O block size. The value must be a multiple of 512 and between 512 and 65024 (64KB-512KB). IBM suggests using 65024 for the switch, and 24576 (24KB) for token-ring or Ethernet networks. (Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for more information regarding tuning IBM Virtual Shared Disk performance.) Use statvsd to display the current value on the node on which the command is run. Set to the same value on all nodes.

    -v vsd_name ...
    Resets the statistics in the number of read and write requests on the specified IBM Virtual Shared Disks.

    -C
    Resets the IBM Virtual Shared Disk device driver counters displayed by the statvsd command. Exceptions are the outgoing and expected request sequence numbers among the client and server nodes.

    -V
    Resets all the configured IBM Virtual Shared Disk's statistics in the number of read and write requests.

    Operands

    None.

    Description

    The ctlvsd command changes some parameters of the IBM Virtual Shared Disk. When called with no arguments it displays the current and maximum cache buffer count, the request block count, the pbuf count, the minimum buddy buffer size, the maximum buddy buffer size as well as the overall size of the buddy buffer.

    Use statvsd to display outgoing and expected sequence numbers and out cast status of other nodes as viewed by the node on which the command is run. It is best to suspendvsd -a on all nodes whose sequence numbers are being reset prior to actually resetting the sequence numbers. Be sure to use resumevsd on all IBM Virtual Shared Disks that were suspended after resetting the sequence numbers.

    Initially, all sequence numbers are set to 0 when the first IBM Virtual Shared Disk is configured and the IBM Virtual Shared Disk device driver is loaded. Thereafter, sequence numbers are incremented as requests are sent to (outgoing) and received from (expected) other nodes, and reset via ctlvsd -R | -r commands.

    Reloading the IBM Virtual Shared Disk device driver by suspendvsd -a, stopvsd -a, or ucfgvsd -a followed by cfgvsd also resets all sequence numbers to 0.

    Initially, all nodes in the IBM Virtual Shared Disk network are cast in. The ctlvsd -k command casts a node out. The local node ignores requests from cast out nodes. The ctlvsd -r command casts nodes back in.

    Files

    /usr/lpp/csd/bin/ctlvsd
    Specifies the command file.

    Security

    You must have root privilege to run this command.

    Restrictions

    If you have the IBM Recoverable Virtual Shared Disk product installed and operational, do not use the -k and -K options. The results may be unpredictable.

    See IBM Parallel System Support Programs for AIX: Managing Shared Disks.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: cfgvsd, lsvsd, preparevsd, resumevsd, startvsd, statvsd, stopvsd, suspendvsd, ucfgvsd

    Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for information on tuning IBM Virtual Shared Disk performance and sequence numbers.

    Examples

    To display the current parameters, enter:

    ctlvsd
    
    The system displays a message similar to the following:
    The current cache buffer count is 64.
    The maximum cache buffer count is 256.
    The request block count is 256.
    The pbuf's count is 48.
    The minimum buddy buffer size is 4096.
    The maximum buddy buffer size is 65536.
    The total buddy buffer size is 4 max buffers, 262144 bytes.
    

    To display the mbuf headers and current routing table, enter:

    ctlvsd -t
    
    The system displays the following information:
    Mbuf Cache Stats:
                    Header
       Cached        1
          Hit     1023
         Miss        1
    Route cache information:
     destination  interface  ref  status  direct/gateway   min managed mbuf
         1         css0        2    Up         Direct               256
    

    defhsd

    Purpose

    defhsd - Defines a data striping device (HSD).

    Syntax

    defhsd protect_LVCB | not_protect_LVCB hsd_name stripe_size vsd_name...

    Flags

    None.

    Operands

    protect_lvcb | not_protect_lvcb
    Protects the logical volume control block information that is stored at the first block of a logical volume. If protect_lvcb is specified, the data striping device (HSD) will skip the first stripe on each underlying IBM Virtual Shared Disk in an HSD. In this case, you should define each logical volume one stripe larger than necessary. If the IBM Virtual Shared Disk and Logical Volume Manager (LVM) disk mirroring are used, the logical volume control block information is critical.

    hsd_name
    Specifies a unique name for the new HSD. This name must be unique across the system partition and should be unique across the SP to avoid any naming conflicts during future system partitioning operations. The length of the name must be less than or equal to 31 characters.

    stripe_size
    Specifies the maximum size of data stored on an IBM Virtual Shared Disk at one time. The smallest stripe size is 4096 bytes. The stripe size must be a multiple of 4096 and less than or equal to 1GB.

    vsd_name
    Specifies the IBM Virtual Shared Disks that compose the HSD. All underlying IBM Virtual Shared Disks in the HSD must be defined before using this command.

    Description

    The defhsd command is used to specify the hsd_name, stripe size and underlying IBM Virtual Shared Disks for the new data striping device (HSD).

    You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:

    smit vsd_data
    
    and select the Define a Hashed Shared Disk option.

    Files

    /usr/lpp/csd/bin/defhsd
    Specifies the command file.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: hsdatalst, undefhsd

    Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for information on tuning IBM Virtual Shared Disk performance and sequence numbers.

    Examples

    The following example adds SDR information indicating a stripe size of 32768, composed of vsd.vsdn101, vsd.vsdn201, and the name hsd1 is defined.

    defhsd hsd1 32768 vsd.vsdn101 vsd.vsdn201
    

    defvsd

    Purpose

    defvsd - Defines an IBM Virtual Shared Disk.

    Syntax

    defvsd logical_volume_name global_group_name vsd_name [nocache | cache]

    Flags

    None.

    Operands

    logical_volume_name
    Is the name of the logical volume you want to specify as an IBM Virtual Shared Disk. This logical volume must reside on the global volume group indicated. The length of the name must be less than or equal to 15 characters.

    global_group_name
    Is the name of the globally-accessible volume group previously defined by the vsdvg command where you want to specify an IBM Virtual Shared Disk. The length of the name must be less than or equal to 31 characters.

    vsd_name
    Specifies a unique name for the new IBM Virtual Shared Disk. This name must be unique across the system partition and should be unique across the SP, to avoid any naming conflicts during future system partitioning operations. The suggested naming convention is vsdnngvg_name. The length of the name must be less than or equal to 31 characters.
    Note: If you choose a vsd_name that is already the name of another device, the cfgvsd command will fail for that IBM Virtual Shared Disk. This failure ensures that the special device files created for the name do not overlay and destroy files of the same name representing some other device type (such as a logical volume).

    nocache | cache
    Affects how requests are processed at the server node. nocache is the default. cache tells the IBM Virtual Shared Disk software on the server node to use the cache for all 4KB requests on 4KB boundaries. Otherwise, the cache is not used.

    The cache option should only be used if the using application gains performance by avoiding a 4KB read immediately after a 4KB write. Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for additional information on IBM Virtual Shared Disk tuning.

    Description

    This command is run to specify logical volumes residing on globally accessible volume groups to be used as IBM Virtual Shared Disks.

    You can use the System Management Interface Tool (SMIT) to run the defvsd command. To use SMIT, enter:

    smit vsd_data
    
    and select the Define a Virtual Shared Disk option.

    Security

    You must have root privilege to run this command.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: vsdatalst, vsdvg, undefvsd

    Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for information regarding IBM Virtual Shared Disk performance enhancements.

    Examples

    1. The following example adds SDR information indicating that on globally accessible volume group vg1n1, the logical volume known as lv1vg1n1 is used as a noncached IBM Virtual Shared Disk named vsd1vg1n1.
      defvsd lv1vg1n1 vg1n1 vsd1vg1n1
      

    2. The following example defines cachable IBM Virtual Shared Disk vsd1vg2n1 on the lv2vg1n1 logical volume on the vg1n1 globally accessible volume group
      defvsd lv2vg1n1 vg1n1 vsd1vg2n1 cache
      

    delnimclient

    Purpose

    delnimclient - Deletes a Network Installation Management (NIM) client definition from a NIM master.

    Syntax

    delnimclient -h | -l node_list | -s server_node_list

    Flags

    -h
    Displays usage information. If the command is issued with the -h flag, the syntax description is displayed to standard output and no other action is taken (even if other valid flags are entered along with the -h flag).

    -l node_list
    Indicates by node_list the SP nodes to be unconfigured as NIM clients of their boot/install servers. The node_list is a comma-separated list of node numbers.

    -s server_node_list
    Indicates by server_node_list the SP boot/install server nodes on which to delete all NIM clients that are no longer defined as boot/install clients in the System Data Repository (SDR). Server node 0 (zero) signifies the control workstation.

    Operands

    None.

    Description

    Use this command to undefine a node as a NIM client. This is accomplished by determining the node's boot/install server and unconfiguring that client node as a NIM client on that server. When complete, the entry for the specified client is deleted from the NIM configuration database on the server. This command does not change the boot/install attributes for the nodes in the System Data Repository.
    Note: This command results in no processing on the client node.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    -1
    Indicates that an error occurred.

    Security

    You must have root privilege to run this command.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Location

    /usr/lpp/ssp/bin/delnimclient

    Related Information

    Commands: mknimclient, setup_server

    Examples

    To delete the NIM client definition for nodes 1, 3, and 5 from the NIM database on their respective boot/install servers, enter:

    delnimclient -l 1,3,5
    

    delnimmast

    Purpose

    delnimmast - Unconfigures a node as a Network Installation Management (NIM) master.

    Syntax

    delnimmast -h | -l node_list

    Flags

    -h
    Displays usage information. If the command is issued with the -h flag, the syntax description is displayed to standard output and no other action is taken (even if other valid flags are entered along with the -h flag).

    -l node_list
    Indicates by node_list the SP nodes to be unconfigured as NIM masters. The node_list is a comma-separated list of node numbers. Node number 0 (zero) signifies the control workstation.

    Operands

    None.

    Description

    Use this command to undefine a node as a NIM master. This command does not change the boot/install attributes for the nodes in the System Data Repository.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    -1
    Indicates that an error occurred.

    Security

    You must have root privilege to run this command.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Location

    /usr/lpp/ssp/bin/delnimmast

    Related Information

    Commands: mknimmast, setup_server

    Examples

    To unconfigure nodes 1, 3, and 5 as NIM masters and delete the NIM file sets, enter:

    delnimmast -l 1,3,5
    

    dsh

    Purpose

    dsh - Issues commands to a group of hosts in parallel.

    Syntax

    dsh
    [-q]

    dsh
    [-h]

    dsh
    [-i] [v] [c] [a] [G] [-l login_name] [-N node_group,node_group, ...]
     
    [-w {host_names | -}] [-f fanout_value] [command]

    Flags

    -q
    Displays a list of hosts in the current working collective file. The WCOLL environment variable is examined to find the name of the file containing the host names in a working collective, and host names from that file are displayed. In addition, the value of the FANOUT environment variable is displayed.

    -h
    Displays usage information.

    -i
    Contains information about the working collective and commands. If this flag is set, the working collective and the command is displayed as each command is issued.

    -v
    Verifies hosts before adding to the working collective. If this flag is set, each host to be added to the working collective is checked before it is added to the collective. If a host is not responding, it is not included in the working collective.

    -c
    Indicates that dsh continues to send commands to hosts for which previous rsh's have returned a nonzero return code. If this flag is not set, the host is removed from the working collective for the duration of this dsh command.

    -a
    Specifies that the System Data Repository initial_hostname field for all nodes in the current system partition be added to the working collective. If -G is specified, all nodes in the SP system are included.

    -G
    Changes the scope of the -a and -N arguments from the current system partition to the SP system.

    -l
    Specifies a remote user name under which to execute the commands. If l is not used, the remote user name is the same as your local user name. (This is lowercase l, as in list.)

    -w
    Specifies a list of host names, separated by commas, to include in the working collective. Both this flag and the a flag can be included on the same command line. If "-" is specified, host names are read from standard input. If -w - is used, commands cannot be read from standard input. Duplicate host names are only included once in the working collective.

    -f
    Specifies a fanout value. The default value is 64. It indicates the maximum number of concurrent rsh's to execute. Sequential execution can be specified by indicating a fanout value of 1. The fanout value is taken from the FANOUT environment variable if the f flag is not specified, otherwise the default is used.

    -N
    Specifies a list of node groups. Each node group is resolved into nodes and these nodes are added to the working collective. If -G is supplied, a global node group is used. Otherwise, a partitioned-bound node group is used.

    If the -a, -w, or -N flags are not specified, the WCOLL environment variable contains the name of a file containing host names for the working collective.

    Operands

    command
    Specifies a command to execute on the working collective. It is passed to rsh. This command is specified in rsh syntax (see the SP rsh command).

    Description

    The dsh executes commands against all or any subset of the hosts in a network. It reads lines from the command line or standard input and executes each as a command on a set of network-connected hosts. These commands are in rsh syntax. Alternatively, a single command in rsh syntax can be specified on the dsh command line.

    As each command is read, it is interpreted by passing it to each host in a group called the working collective via the SP rsh command.

    The working collective is obtained from the first existence of one of the following:

    1. A list of host names specified on the command line and the members of the cluster as listed in the System Data Repository.

    2. The contents of a file named by the WCOLL environment variable.

    If neither of these exist, an error has occurred and no commands are issued.

    The working collective file should have one host name per line. Blank lines and comment lines beginning with # are ignored.

    The path used when resolving the dsh command on the target nodes is the path set by the user with the DSHPATH environment variable. If DSHPATH is not set, the path used is the rsh default path, /usr/ucb:/bin:/usr/bin:. The DSHPATH environment variable only works when the user's remote login shell is the Bourne or Korn shell. An example would be to set DSHPATH to the path set on the source machine (for example, DSHPATH=$PATH).

    The maximum number of concurrent rsh's can be specified with the fanout (f) flag or via the FANOUT environment variable. If desired, sequential execution can be obtained by specifying a fanout value of 1. Results are displayed as remote commands complete. All rsh's in a fanout must complete before the next set of rsh's is started. If fanout is not specified via FANOUT or the f flag, rsh's to 64 hosts are issued concurrently. Each rsh that dsh runs requires a reserved TCP/IP port and only 512 such ports are available per host. With large fanouts, it is possible to exhaust all the ports on a host, causing commands that use these ports, such as the SP rlogin and the SP rsh commands, to fail.

    Exit values for the rsh commands are displayed in messages from dsh if nonzero. (A nonzero return code from rsh indicates that the rsh has failed; it has nothing to do with the exit code of the remotely executed command.) If an rsh fails, that host is removed from the current working collective (not the current working collective file), unless the c flag was set.

    The dsh exit value is 0 if no errors occurred in the dsh command and all rsh's finished with exit codes of 0. The dsh exit value is more than 0 if internal errors occur or the rsh's fail. The exit value is increased by 1 for each rsh failure.

    No particular error recovery for command failure on remote hosts is provided. The application or user can examine the command results in dsh's standard error and standard output, and take appropriate action.

    The dsh command waits until results are in for each command for all hosts and displays those results before reading more input commands.

    The dsh command does not work with interactive commands, including those that read from standard input.

    The dsh command output consists of the output (standard error and standard output) of the remotely executed commands. The dsh standard output is the standard output of the remote command. The dsh standard error is the standard error of the remote command. Each line is prefixed with the host name of the host from which that output came. The host name is followed by ":" and a line of the command output.

    For example, let's say that a command was issued to a working collective of host1, host2, and host3. When the command was issued on each of the hosts, the following lines were written by the remote commands:

    For host1 stdout:
    h1out1
    h1out2
     
    For host2 stdout:
    h2out1
    h2out2
     
    For host3 stdout:
    h3out1
     
    For host3 stderr:
    h3err1
    h3err2
     
    dsh stdout will be
    host1: h1out1
    host1: h1out2
    host2: h2out1
    host2: h2out2
    host3: h3out1
     
    dsh stderr will be
    host3: h3err1
    host3: h3err2
    

    A filter to display the output lines by the host is provided separately. See the dshbak command.

    If a host is detected as down (for example, an rsh returns a nonzero return code), subsequent commands are not sent to it on this invocation of dsh, unless the c (continue) option is specified on the command line.

    An exclamation point at the beginning of a command line causes the command to be passed directly to the local host in the current environment. The command is not sent to the working collective.

    Signals 2 (INT), 3 (QUIT), and 15 (TERM) are propagated to the remote commands.

    Signals 19 (CONT), 17 (STOP), and 18 (TSTP) are defaulted. This means that the dsh command responds normally to these signals, but they do not have an effect on the remotely running commands. Other signals are caught by dsh and have their default effects on the dsh command. In the case of these other signals, all current children, and via propagation their remotely running commands, are terminated (SIGTERM).

    Security considerations are the same as for the SP rsh command.

    Files

    /usr/sbin/dsh
    The dsh command.

    /usr/sbin/dshbak
    The supplied backend formatting filter.

    working collective file
    A file containing host names, one per line, that defines a working collective.

    Related Information

    Command: dshbak

    SP Commands: rsh, sysctl

    Examples

    1. To issue the ps command on each host listed in the wchosts file, enter:
      WCOLL=./wchosts dsh ps
      

    2. To list the current working collective file as specified by the WCOLL environment variable, enter:
      dsh -q
      

    3. To set the working collective to three hosts and start reading commands from standard input, enter:
      dsh -w otherhost1,otherhost2,otherhost3
      

    4. To set the current working collective to three hosts, plus the members of the cluster, and issue a command on those hosts formatting the output, enter:
      dsh -w host1,host2,host3 -a cat /etc/passwd | dshbak
      

    5. To append the file remotefile on otherhost to otherremotefile, which is on otherhost, enter:
      dsh -w otherhost cat remotefile '>>' otherremotefile
      

    6. To run a file of commands sequentially on all the members of the current system partition and save the results in a file, including the collective and the working collective for each command, enter:
      dsh -if 1 -a < commands_file > results 2>&1
      

    7. To run the ps command on the working collective and filter results locally, enter:
      dsh ps -ef | grep root
      

    8. To run the ps command and filter results on the working collective hosts (this can improve performance considerably), enter:
      dsh 'ps -ef | grep root'
      

      or

      dsh ps -ef "|" grep root
      

    9. To cat a file from host1 to the local system stripping off the preceding host name to preserve the file, enter:
      dsh -w host1 cat /etc/passwd | cut -d: -f2- | cut -c2- > myetcpasswd
      

    10. To run the ps command on each node in the node group my_nodes, enter:
      dsh -N my_nodes ps
      

    dshbak

    Purpose

    dshbak - Presents formatted output from the dsh and sysctl commands.

    Syntax

    dshbak [-c]

    Flags

    -c
    Collapses identical output from more than one host so that it is displayed only once.

    Operands

    None.

    Description

    The dshbak command takes lines in the following format:

    host_name: line of output from remote command
    

    The dshbak command formats them as follows and writes them to standard output. Assume that the output from host_name3 and host_name4 is identical and the c option was specified:

    HOSTS -----------------------------------------------------------------
    host_name1
    -----------------------------------------------------------------------
    .
    .
    lines from dsh or sysctl with host_names stripped off
    .
    .
    HOSTS -----------------------------------------------------------------
    host_name2
    -----------------------------------------------------------------------
    .
    .
    lines from dsh or sysctl with host_names stripped off
    .
    .
    HOSTS -----------------------------------------------------------------
    host_name3             host_name4
    -----------------------------------------------------------------------
    .
    .
    lines from dsh or sysctl with host_names stripped off
    .
    .
    

    When output is displayed from more than one host in collapsed form, the host names are displayed alphabetically.

    When output is not collapsed, output is displayed sorted alphabetically by host name.

    The dshbak command writes "." for each 1000 lines of output filtered.

    Files

    /usr/sbin/dshbak
    The dshbak command.

    Related Information

    Commands: dsh, sysctl

    Examples

    1. To display the results of a command executed on several hosts in the format described previously, enter:
      dsh -w host1,host2,host3 cat /etc/passwd | dshbak
      

    2. To display the results of a command executed on several hosts with identical output displayed only once, enter:
      dsh -w host1,host2,host3 pwd | dshbak -c
      

    Eannotator

    Purpose

    Eannotator - Annotates the connection labels in the topology file.

    Syntax

    Eannotator -F input_file -f output_file -O [yes | no]

    Flags

    -F
    Specifies the topology input file.

    -f
    Specifies the topology output file.

    -O
    Specifies whether to save the output file to the System Data Repository (SDR) or to the current directory. yes saves the output file to the SDR via the Etopology command. no saves the output file to the current directory.

    Operands

    None.

    Description

    This command supports all of the following:

    This command must be executed whenever a new topology file is selected.

    The topology file, which describes the wiring configuration for the High Performance Switch, contains node-to-switch or switch-to-switch cable information. A node-to-switch connection looks like following:

    s 25 2 tb0 17 0     E2-S00-BH-J16 to E2-N2
    

    The predefined node-to-switch connections start with an "s" which indicates a switch connection. The next two digits, in this case "25" indicate the switch (2) and switch chip (5) being connected. The next digit, in this case "2", indicates the switch chip port in the connection. The next field, in this case "tb0", specifies the type of adapter present in the SP node. The following field, in this case "17", is the switch node number for the SP node, and the last digit, in this case "0", indicates the adapter port within the connection.

    For switch-to-switch connections, the first four fields (switch indicator, switch, switch chip, and switch chip port) are repeated to identify the other end of the connection.

    The connection label "E2-S00-BH-J16 to E2-N2" provides physical connection information for a customer's use to identify the bad connection.

    Depending on the type of switch installed (High Performance Switch or SP Switch), together with the customer's physical switch frame configuration defined in the SDR, the Eannotator command retrieves switch node and dependent node objects from the SDR and applies proper connection information to the topology file.

    If the input topology file contains existing connection information, the Eannotator command replaces the existing connection label with the new connection labels. If the input topology file does not contain connection labels, the Eannotator command appends the proper connection label to each line on the topology file.

    The precoded connection labels on the topology file start with an "L" which indicate logical frames. The Eannotator command replaces the "L" character with an "E" which indicates physical frames. The "S" character indicates which slot the switch occupies in the frame, the "BH" characters indicate a Bulk Head connection, the "J" character indicates which jack provides the connection from the switch board, the "N" character indicates the node being connected to the switch, and the "SC" characters indicate the Switch Chip connection.

    Files

    /etc/SP/expected.top.1nsb_8.0isb.0
    The standard topology file for systems with the 8-port High Performance Switch or a maximum of eight nodes.

    /etc/SP/expected.top.1nsb.0isb.0
    The standard topology file for one NSB system or a maximum of 16 nodes.

    /etc/SP/expected.top.2nsb.0isb.0
    The standard topology file for two NSB systems or a maximum of 32 nodes.

    /etc/SP/expected.top.3nsb.0isb.0
    The standard topology file for three NSB systems or a maximum of 48 nodes.

    /etc/SP/expected.top.4nsb.0isb.0
    The standard topology file for four NSB systems or a maximum of 64 nodes.

    /etc/SP/expected.top.5nsb.0isb.0
    The standard topology file for five NSB systems or a maximum of 80 nodes.

    /etc/SP/expected.top.5nsb.4isb.0
    The standard topology file for five NSB and four ISB systems or a maximum of 80 nodes. This is an advantage-type network with a higher bisectional bandwidth.

    /etc/SP/expected.top.6nsb.4isb.0
    The standard topology file for six NSB and four ISB systems or a maximum of 96 nodes.

    /etc/SP/expected.top.7nsb.4isb.0
    The standard topology file for seven NSB and four ISB systems or a maximum of 112 nodes.

    /etc/SP/expected.top.8nsb.4isb.0
    The standard topology file for eight NSB and four ISB systems or a maximum of 128 nodes.

    /etc/SP/expected.top.1nsb_8.0isb.1
    The standard topology file for systems with an SP Switch-8 and a maximum of eight nodes.

    Security

    You must have root privilege to run this command.

    Related Information

    Commands: Eclock, Eduration, Efence, Eprimary, Equiesce, Estart, Etopology, Eunfence, Eupartition

    Refer to IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment for details about system partition topology files.

    Examples

    1. The following are the topology file entries before and after the Eannotator command executes:
      Before:
      s 15 3 tb0 0 0 L01-S00-BH-J18 to L01-N1
       
      After:
      s 15 3 tb0 0 0 E01-S17-BH-J18 to E01-N1
      
      Note: Logical frame L01 is defined as physical frame 1 in the SDR Switch object.
      Before:
      s 10016 0 s 51 3 L09-S1-BH-J20 to L05-S00-BH-J19
       
      After:
      s 10016 0 s 51 3 E10-S1-BH-J20 to E05-S17-BH-J19
      
      Note: Logical frame L09 is defined as physical frame 10 in the SDR Switch object.
      Before:
      s 15 3 tb0 0 0 L03-S00-BH-J18 to L03-N3
       
      After:
      s 15 3 tb3 0 0 E03-S17-BH-J18 to E03-N3 # Dependent Node
      
      Note: Logical frame L03 is defined as physical frame 3 in the SDR Switch object and the node was determined to be a dependent node.

    2. To annotate a topology file for a 128-way SP system with eight Node Switch Boards (NSBs) and four Intermediate Switch Boards (ISBs) and to save the output file in the current directory, enter:
      Eannotator -F expected.top.8nsb.4isb.0 -f expected.top -O no
      

    3. To annotate a topology file for a 16-way SP system with one NSB and no ISBs and to save the output file in the SDR via the Etopology command, enter:
      Eannotator -F expected.top.1nsb.0isb.0 -f expected.top -O yes
      

    Eclock

    Purpose

    Eclock - Controls the clock source for each switch board within an SP cluster.

    Syntax

    Eclock
    [-f Eclock_topology_file] | [-a Eclock_topology_file] | [-r] | [-d] |
     
    [-s switch_number -m mux_value] | [-c Eclock_topology_file]

    Flags

    -f Eclock_topology_file
    Specifies the file name of the clock topology file containing the initial switch clock input values for all switches in the system.

    -a Eclock_topology_file
    Uses the alternate Eclock topology specified in the given clock topology file.

    -r
    Extracts the clock topology file information from the System Data Repository (SDR) and initializes the switch clock inputs for all switches in the system.

    -d
    Detects the switch configuration, automatically selects the clock topology file, and initializes the switch clock inputs for all switches in the system.

    -s switch_number -m mux_value
    Sets an individual switch (switch_number) clock multiplexor (mux) value (mux_value)

    where:

    switch_number
    Specifies the switch number.

    mux_value
    Specifies a flag with one of the following values:

    High Performance Switch

    0
    Use the internal oscillator (make this frame the master frame).
    1
    Use input 1 (clock input from jack 3).
    2
    Use input 2 (clock input from jack 5).
    3
    Use input 3 (clock input from jack 7).

    SP Switch

    0
    Use the internal oscillator (make this frame the master frame).
    1
    Use input 1 (clock input from jack 3) (NSBs or ISBs).
    2
    Use input 2 (clock input from jack 4) (NSBs or ISBs).
    3
    Use clock input from jack 5 (NSBs or ISBs).
    4
    Use clock input from jack 4 (NSBs or ISBs).
    5
    Use clock input from jack 5 (NSBs or ISBs).
    6
    Use clock input from jack 6 (NSBs or ISBs).
    7
    Use clock input from jack 7 (ISBs only).
    8
    Use clock input from jack 8 (ISBs only).
    9
    Use clock input from jack 9 (ISBs only).
    10
    Use clock input from jack 10 (ISBs only).
    27
    Use clock input from jack 27 (NSBs or ISBs).
    28
    Use clock input from jack 28 (NSBs or ISBs).
    29
    Use clock input from jack 29 (NSBs or ISBs).
    30
    Use clock input from jack 30 (NSBs or ISBs).
    31
    Use clock input from jack 31 (ISBs only).
    32
    Use clock input from jack 32 (ISBs only).
    33
    Use clock input from jack 33 (ISBs only).
    34
    Use clock input from jack 34 (ISBs only).

    -c Eclock_topology_file
    Creates a new clock topology file from the data in the SDR.

    If a flag is not specified, the clock input values stored in the SDR are displayed.

    Operands

    None.

    Description

    Use this command to set the multiplexors that control the clocking at each switch board within the configuration. One switch board within the configuration is designated as the "Master" switch that provides the clocking signal for all other switch boards within the configuration. The Eclock command reads clock topology information from either the file specified on the command line or the clock topology data within the SDR. If a clock topology file was specified, the Eclock command places the clock topology information into the SDR, so that it can be accessed again during a subsequent Eclock invocation. After processing the clock topology file, Eclock causes the new clock topology to take effect for the switches specified. A clock topology file contains the following information for each switch board within the cluster:

    High Performance Switch Warning

    Do not change the switch clock multiplexor settings (with Eclock, spmon command/GUI, hmcmds) while the nodes are powered on. Otherwise, AIX must be rebooted and Estart must be run following the clock adjustment.

    SP Switch Warning

    Do not change the switch clock multiplexor settings (with Eclock, spmon command/GUI, hmcmds) while the nodes are powered on. Otherwise, Estart must be run following the clock adjustment.

    To execute the Eclock command, the user must be authorized to access the Hardware Monitor subsystem and, for those frames specified to the command, the user must be granted VFOP (Virtual Front Operator Panel) permission. Commands sent to frames for which the user does not have VFOP permission are ignored. Since the Hardware Monitor subsystem uses SP authentication services, the user must execute the kinit command prior to executing this command. Alternatively, site-specific procedures can be used to obtain the tokens that are otherwise obtained by kinit.

    Files

    /etc/SP/Eclock.top.1nsb.0isb.0
    The standard clock topology file for systems with one NSB or a maximum of 16 nodes.

    /etc/SP/Eclock.top.1nsb_8.0isb.0
    The standard clock topology file for systems with the 8-port High Performance Switch or an SP Switch-8 or a maximum of eight nodes.

    /etc/SP/Eclock.top.2nsb.0isb.0
    The standard clock topology file for systems with two NSBs or a maximum of 32 nodes.

    /etc/SP/Eclock.top.3nsb.0isb.0
    The standard clock topology file for systems with three NSBs or a maximum of 48 nodes.

    /etc/SP/Eclock.top.4nsb.0isb.0
    The standard clock topology file for systems with four NSBs or a maximum of 64 nodes.

    /etc/SP/Eclock.top.5nsb.0isb.0
    The standard clock topology file for systems with five NSBs or a maximum of 80 nodes.

    /etc/SP/Eclock.top.5nsb.4isb.0
    The standard clock topology file for systems with five NSBs and four ISBs or a maximum of 80 nodes. This is an advantage-type network with a higher bisectional bandwidth.

    /etc/SP/Eclock.top.6nsb.4isb.0
    The standard clock topology file for systems with six NSBs and four ISBs or a maximum of 96 nodes.

    /etc/SP/Eclock.top.7nsb.4isb.0
    The standard clock topology file for systems with seven NSBs and four ISBs or a maximum of 112 nodes.

    /etc/SP/Eclock.top.8nsb.4isb.0
    The standard clock topology file for systems with eight NSBs and four ISBs or a maximum of 128 nodes.

    Security

    You must have root privilege to run this command.

    Related Information

    Commands: Eannotator, Eduration, Efence, Eprimary, Equiesce, Estart, Etopology, Eunfence, Eunpartition

    Examples

    1. To set the clock multiplexors for a 128-way SP system with eight Node Switch Boards (NSBs) and four Intermediate Switch Boards (ISBs), enter:
      Eclock -f /etc/SP/Eclock.top.8nsb.4isb.0
      

    2. To display the clock multiplexor settings for all switches within the SP system, enter:
      Eclock
      

    3. To set the switch on frame 1 (switch 1) to be the master switch (use internal oscillator), enter:
      Eclock -s 1 -m 0
      

    4. To create an Eclock topology file from the current data in the SDR, enter:
      Eclock -c /tmp/Eclock.top
      

    5. To use an alternate clock topology (with a new switch clock source) for a 64-way SP system with two ISBs, enter:
      Eclock -a /etc/SP/Eclock.top.4nsb.2isb.0
      

    6. To have Eclock automatically select a topology file for you based on data in the SDR, enter:
      Eclock -d
      

    Eduration

    Purpose

    Usage Note

    Do not use this command if you have the SP Switch installed on your system.

    Eduration - Sets the interval that nodes can be added or removed from the High Performance Switch. This interval is called the Run Phase Duration.

    Syntax

    Eduration [[days day[s]] [hours hour[s]] [minutes minute[s]] ] | [-h]

    Flags

    days day[s]
    Specifies the number of days that the switch will stay in the Run Phase. Valid values are 1--40.

    hours hour[s]
    Specifies the number of hours that the switch will stay in the Run Phase. Valid values are 1--23.

    minutes minute[s]
    Specifies the number of minutes that the switch will stay in the Run Phase. Valid values are 1--59.

    -h
    Displays usage information.

    Any combination of the three preceding time designations can be used to specify the new Run Phase Duration. Since the duration determines how quickly the system can respond to Efence and Eunfence requests, it should be set to provide the desired response. If none of the time specifiers are present, Eduration will display the current value of the Run Phase Duration.

    Operands

    None.

    Description

    The Run Phase Duration controls how frequently Efence and Eunfence requests are handled. This command provides an interface to set that value.
    Note: The Run Phase Duration changes will not take effect until the end of the current Run Phase. If you are changing the Run Phase Duration from a large value to something that is significantly smaller and you do not want to wait for the current the Run Phase to complete, you will have to Estart the switch.

    Security

    You must have root privilege to run this command.

    Related Information

    Commands: Eannotator, Eclock, Efence, Eprimary, Equiesce, Estart, Etopology, Eunfence, Eunpartition

    Examples

    1. To set the Run Phase Duration to 1 minute, enter:
      Eduration 1 minute
      

    2. To set the Run Phase Duration to an hour and 30 minutes, enter:
      Eduration 1 hour 30 minutes
      

    3. To query the Run Phase Duration, enter:
      Eduration
      

    Efence

    Purpose

    Efence - Removes an SP node from the current active switch network.

    Syntax

    Efence [-h] | [-G] [-autojoin] [node_specifier] ...

    Flags

    -h
    Displays usage information.

    -G
    Fences all valid nodes in the list of nodes regardless of system partition boundaries. If the -G flag is not used, the Efence command will only fence the nodes in the current system partition. All other specified nodes will not be fenced and a nonzero return code is returned.

    -autojoin
    Enables the nodes in the argument list to be fenced and to automatically rejoin the current switch network if the node is rebooted or the Fault Service daemon is restarted.

    If you have an SP Switch installed on your system, such nodes are also rejoined when an Estart command is issued.

    Operands

    node_specifier
    Specifies a node or a list of nodes that are to be taken out of the current switch network. It can be a list of host names, IP addresses, node numbers, frame,slot pairs, or a node group.
    Note: You cannot fence the primary node on the High Performance Switch and you cannot fence either the primary or primary backup nodes on the SP Switch.

    Description

    Use this command to fence a node from the current switch network.

    If you have an SP Switch installed on your system, you must do either of the following to bring the node back up onto the switch network:

    If you have a High Performance Switch installed on your system, you can issue the Estart command to rejoin all nodes on the switch network.

    Note: If a host name or IP address is used as the node_specifier for a dependent node, it must be a host name or IP address assigned to the adapter that connects the dependent node to the SP Switch. Neither the administrative host name nor the Simple Network Management Protocol (SNMP) agent's host name for a dependent node is guaranteed to be the same as the host name of its switch network interface.

    Security

    You must have root privilege to run this command.

    Related Information

    Commands: Eannotator, Eclock, Eduration, Eprimary, Equiesce, Estart, Etopology, Eunfence, Eunpartition

    Examples

    1. To display all the nodes that were fenced from the switch network in the current system partition, enter:
      Efence
      

    2. To display only the nodes that were fenced from the switch network with the automatic join option enabled, enter:
      Efence -autojoin
      

    3. To display all the nodes that were fenced from the switch network in all system partitions, enter:
      Efence -G
      

    4. To fence two nodes by IP address, enter:
      Efence 129.33.34.1 129.33.34.6
      

    5. To fence a node by host name, enter:
      Efence r11n01
      

    6. To fence a list of nodes by node number and enable -autojoin, enter:
      Efence -autojoin 54 65 32 78
      

    7. To fence node 14 of frame 2 by frame,slot pair, enter:
      Efence 2,14
      

    8. If the current partition has nodes with node numbers 1, 2, 5, and 6 and another partition has nodes with node numbers 3, 4, 7, and 8, issuing the command:
      Efence 5 6 7 8
      
      fences nodes 5 and 6, but not nodes 7 and 8. As a result, the command returns a nonzero return code.

    9. To successfully fence the nodes in example 8 with the same partitions, use the -G flag as follows:
      Efence -G 5 6 7 8
      

    emconditionctrl Script

    Purpose

    emconditionctrl - Loads the System Data Repository (SDR) with predefined Event Management conditions.

    Syntax

    emconditionctrl [-a] [-s] [-k] [-d] [-c] [-t] [-o] [-r] [-h]

    Flags

    -a
    Loads the SDR with predefined Event Management conditions for the current system partition.

    -s
    Starts the subsystem. (Currently has no effect.)

    -k
    Stops the subsystem. (Currently has no effect.)

    -d
    Deletes the subsystem. (Currently has no effect.)

    -c
    Cleans the subsystem. (Currently has no effect.)

    -t
    Turns tracing on. (Currently has no effect.)

    -o
    Turns tracing off. (Currently has no effect.)

    -r
    Refreshes the subsystem. (Currently has no effect.)

    -h
    Displays usage information.

    Operands

    None.

    Description

    The emconditionctrl script loads the SDR with some useful conditions that can be used for registering for Event Management events. Currently the SP Perspectives application can make use of conditions.

    The emconditionctrl script is not normally executed on the command line. It is normally called by the syspar_ctrl command after the control workstation has been installed or when the system is partitioned. It implements all of the flags that syspar_ctrl can pass to its subsystems, although only the -a flag causes any change to the system. The -a flag causes predefined conditions to be loaded only if run on the control workstation. It has no effect if run elsewhere.

    Exit Values

    0
    Indicates the successful completion of the command.

    nonzero
    Indicates an exit code from the SDRCreateObjects command.

    Security

    You must be running with an effective user ID of root.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Location

    /usr/lpp/ssp/bin/emconditionctrl

    Related Information

    Commands: syspar_ctrl

    emonctrl Script

    Purpose

    emonctrl - A control script that manages the Emonitor subsystem.

    Syntax

    emonctrl { -a | -s | -k | -d | -c | -t | -o | -r | -h }

    Flags

    -a
    Adds the subsystem.

    -s
    Starts the subsystem. Not implemented. The subsystem should be started using Estart -m

    -k
    Stops the subsystem.

    -d
    Deletes the subsystem.

    -c
    Cleans the subsystems, that is, delete them from all system partitions.

    -t
    Turns tracing on for the subsystem. Not used.

    -o
    Turns tracing off for the subsystem. Not used.

    -r
    Refreshes the subsystem. Not implemented.

    -h
    Displays usage information.

    Operands

    None.

    Description

    The Emonitor subsystem monitors designated nodes in an attempt to maximize their availability on the switch network.

    The emonctrl control script controls the operation of the Emonitor subsystem. The subsystem is under the control of the System Resource Controller (SRC) and belongs to a subsystem group called emon.

    An instance of the Emonitor subsystem can execute on the control workstation for each system partition. Because Emonitor provides its services within the scope of a system partition, it is said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It should be issued from the control workstation and is not functional on the nodes.

    From an operational point of view, the Emonitor subsystem group is organized as follows:

    Subsystem
    Emonitor

    Subsystem Group
    emon

    SRC Group
    emon

    The emon group is associated with the Emonitor daemon.

    On the control workstation, there are multiple instances of Emonitor, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named Emonitor.sp_prod and Emonitor.sp_test.

    Daemons
    Emonitor

    The Emonitor daemon provides switch node monitoring.

    The emonctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

    The emonctrl script provides a variety of controls for operating the Emonitor daemon:

    Before performing any of these functions, the script obtains the current system partition name and IP address (using the spget_syspar command) and the node number (using the node_number) command. If the node number is zero, the control script is running on the control workstation. Since the Emonitor daemon runs only on the control workstation, the script performs no function when run on a node.

    Except for the clean function, all functions are performed within the scope of the current system partition.

    Adding the Subsystem

    When the -a flag is specified, the control script uses the mkssys command to add the Emonitor daemon to the SRC. The control script operates as follows:

    1. It checks whether the Emonitor subsystem already exists in this system partition. If the Emonitor subsystem does exist, it exits.

    2. It adds the Emonitor subsystem to the SRC with the system partition name appended.

    Starting the Subsystem

    This option is unused since the Emonitor daemon must be started via Estart -m.

    Stopping the Subsystem

    When the -k flag is specified, the control script uses the stopsrc command to stop the Emonitor daemon in the current system partition.

    Deleting the Subsystem

    When the -d flag is specified, the control script uses the rmssys command to remove the Emonitor subsystem from the SRC. The control script operates as follows:

    1. It makes sure that the Emonitor subsystem is stopped.

    2. It removes the Emonitor subsystem from the SRC using the rmssys command.

    Cleaning Up the Subsystems

    When the -c flag is specified, the control script stops and removes the Emonitor subsystems for all system partitions from the SRC. The control script operates as follows:

    1. It stops all instances of subsystems in the subsystem group in all system partitions, using the stopsrc -g emon command.

    2. It removes all instances of subsystems in the subsystem group in all system partitions from the SRC using the rmssys command.

    Turning Tracing On

    Not currently used.

    Turning Tracing Off

    Not currently used.

    Refreshing the Subsystem

    Not currently used.

    Logging

    While it is running, the Emonitor daemon provides information about its operation and errors by writing entries in a log file. The Emonitor daemon uses log files called /var/adm/SPlogs/css/Emonitor.log and /var/adm/SPlogs/css/Emonitor.Estart.log.

    Files

    /var/adm/SPlogs/css/Emonitor.log
    Contains the log of all Emonitor daemons on the system.

    /var/adm/SPlogs/css/Emonitor.Estart.log
    Contains the log of all Estart and Eunfence commands issued by all Emonitor daemons.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred.

    Security

    You must have root privilege to run this command.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/emonctrl

    Related Information

    Commands: Emonitor, Estart, lssrc, startsrc, stopsrc, syspar_ctrl

    Examples

    1. To add the Emonitor subsystem to the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      emonctrl -a
      

    2. To stop the Emonitor subsystem in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      emonctrl -k
      

    3. To delete the Emonitor subsystem from the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      emonctrl -d
      

    4. To clean up the Emonitor subsystem on all system partitions, enter:
      emonctrl -c
      

    5. To display the status of all of the subsystems in the Emonitor SRC group, enter:
      lssrc -g emon
      

    6. To display the status of an individual Emonitor subsystem, enter:
      lssrc -s subsystem_name
      

    7. To display the status of all of the daemons under SRC control, enter:
      lssrc -a
      

    Emonitor Daemon

    Purpose

    Emonitor - Monitors nodes listed in the /etc/SP/Emonitor.cfg file in an to attempt to maximize this availability on the switch.

    Syntax

    Emonitor

    Flags

    None.

    Operands

    None.

    Description

    Emonitor is a daemon controlled by the System Resource Controller (SRC). It can be used to monitor nodes in a system partition in regard to the their status on the switch. A system-wide configuration file (/etc/SP/Emonitor.cfg) lists all nodes on the system to be monitored. The objective is to bring these nodes back up on the switch network when necessary.

    Emonitor is invoked with Estart -m. Once invoked, it is controlled by SRC so it will restart if it is halted abnormally. If the you decide to end monitoring, you must run /usr/lpp/ssp/bin/emonctrl -k to stop the daemon in your system partition.

    There is an Emonitor daemon for each system partition. The daemon watches for any node coming up (for example, host_responds goes from 0 to 1). When the daemon detects a node coming up, it performs a review of the nodes in the configuration file to check if any node is off the switch network. If any nodes in the specified system partition are off the switch network, it determines a way to bring them back onto the the switch (for example, via Eunfence or Estart), and takes the appropriate action. In order to avoid the Estart command from being run several times (which can occur if multiple nodes are coming up in sequence), Emonitor waits 3 minutes after a node comes up to be sure no other nodes are in the process of coming up. Each time a new node comes up prior to the 3 minute timeout, Emonitor resets the timer to a maximum wait of 12 minutes.

    Emonitor cannot always bring nodes back on the switch. For example, if any of the following occur:

    On a High Performance Switch, if a node is faulted off the switch and you are forced to do an Estart, you will lose history of any nodes that you had isolated off the switch. All nodes on a High Performance Switch come back on the switch on an Estart.

    Problems can occur if the node that is faulted off the switch is experiencing a recurring error that causes it to come up and then fail repeatedly. The monitor continually attempts to bring this node into the switch network and could jeopardize the stability of the remaining switch network.
    Note: Nodes that will be undergoing hardware or software maintenance should be removed from the Emonitor.cfg file during this maintenance to prevent Emonitor from attempting to to bring them onto the switch network.

    Files

    /etc/SP/Emonitor.cfg
    Specifies a list of node numbers, one per line, that the user wants monitored by Emonitor. This list is system-wide.

    Security

    You must have root privilege to run this command.

    Related Information

    Commands: Eannotator, Eclock, Eduration, Efence, emonctrl, Eprimary, Equiesce, Estart, Etopology, Eunfence, Eupartition

    enadmin

    Purpose

    enadmin - Changes the desired state of a specified extension node.

    Syntax

    enadmin
    [-a {reset | reconfigure}] [-h] node_number

    Flags

    -a
    Specifies the desired state to which the extension node is to be set.

    reconfigure
    Once the administrative state of the extension node is placed in this mode, the Simple Network Management Protocol (SNMP) agent managing the extension node will periodically send trap messages to the spmgrd daemon running on the control workstation requesting configuration data for the extension node. Once the configuration data is received by the agent, it stops sending these requests and uses the configuration data to reconfigure the extension node.

    reset
    Once the administrative state of the extension node is placed in this mode, the SNMP agent managing the extension node will set the extension node to an initial state in which it is no longer an active node on the switch network.

    -h
    Displays usage information.

    Operands

    node_number
    Specifies the node number assigned to the extension node whose state is to be changed.

    Description

    Use this command to change the administrative state of an extension node. Setting the administrative state of an extension node to reconfigure causes configuration data for the extension node to be resent to the extension node's administrative environment. Setting the administrative state of an extension node to reset places the extension node in an initial state in which it is no longer active on the switch network.

    This command is invoked internally when choosing the reconfigure option of the endefadapter and endefnode commands or the reset (-r) option of the enrmnode command.

    You can use the System Management Interface Tool (SMIT) to run this command by selecting the Extension Node Management panel. To use SMIT, enter:

    smit manage_extnode
    

    Standard Output

    All informational messages are written to standard output. These messages identify the extension node being changed and indicate when the specified state change has been accepted for processing by the extension node agent (at which point the command is complete). All error messages are also written to standard output.

    Exit Values

    0
    Indicates the administrative state of the extension node was successfully changed.

    1
    Indicates that an error occurred while processing the command and the administrative state of the extension node was not changed.

    Security

    You must have root privilege to run this command or be a member of the system group.

    Restrictions

    This command can only be issued on the control workstation.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.spmgr file set.

    The spmgrd SNMP manager daemon on the SP control workstation allows transfer of extension node configuration data from the SP system to an SNMP agent providing administrative support for the extension node. Version 1 of the SNMP protocol is used for communication between the SNMP manager and the SNMP agent. Limited control of an extension node is also possible. An SNMP set-request message containing an object instantiation representing the requested administrative state for the extension node is sent from the SNMP manager to the SNMP agent providing administrative support for the extension node. After the administrative state of an extension node is received by the SNMP agent, the enadmin command is completed. Requests for configuration information and information about the state of an extension node are sent to the SNMP manager asynchronously in SNMP trap messages.

    Prerequisite Information

    IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment

    Location

    /usr/lpp/ssp/bin/enadmin

    Related Information

    Commands: endefadapter, endefnode, enrmadapter, enrmnode, spmgrd

    Examples

    1. To request that configuration data for the extension node assigned to node number 9 be sent to its SNMP managing agent, enter:
      enadmin -a reconfigure 9
      

    2. To request that the extension node assigned to node number 9 be placed in an initial state and no longer be active on the switch, enter:
      enadmin -a reset 9
      

    endefadapter

    Purpose

    endefadapter - Adds new or changes existing configuration data for an extension node adapter in the System Data Repository (SDR) and optionally performs the reconfiguration request.

    Syntax

    endefadapter [-a address] [-h] [-m netmask] [-r] node_number

    Flags

    -a address
    Specifies the IP network address of the extension node adapter. The IP network address must be able to be resolved by the host command. This flag is required when adding a new extension node adapter.

    -h
    Displays usage information.

    -m netmask
    Specifies the netmask for the network on which the extension node adapter resides. This flag is required when adding a new extension node adapter.

    -r
    Specifies that the extension node adapter will be reconfigured.

    Operands

    node_number
    Specifies the node number for this extension node adapter. This operand is required.

    Description

    Use this command to define extension node adapter information in the SDR. The -a and -m flags and the node_number operand are required.

    You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:

    smit enter_extadapter
    

    Environment Variables

    The SP_NAME environment variable is used (if set) to direct this command to a system partition. If the SP_NAME environment variable is not set, the default system partition will be used.

    Standard Output

    This command writes informational messages to standard output.

    Standard Error

    This command writes all error messages to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred and the extension node adapter information was not updated.

    Security

    You must have root privilege to run this command or be a member of the system group.

    Restrictions

    This command can only be issued on the control workstation.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.basic file set.

    Prerequisite Information

    IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment

    Location

    /usr/lpp/ssp/bin/endefadapter

    Related Information

    Commands: enadmin, endefnode, enrmadapter, enrmnode

    Examples

    1. The following example shows the definition of an extension node adapter for node number 10 with a network address of 129.40.158.137 and a netmask of 255.255.255.0, enter:
      endefadapter -a 129.40.158.137 -m 255.255.255.0 10
      

    2. The following example shows the same definition, but the extension node adapter will be reconfigured after the SDR is updated:
      endefadapter -a 129.40.158.137 -m 255.255.255.0 -r 10
      

    endefnode

    Purpose

    endefnode - Adds new or changes existing configuration data for an extension node in the System Data Repository (SDR) and optionally performs the reconfiguration request.

    Syntax

    endefnode
    [-a hostname] [-c string] [-h] [-i string] [-r]
     
    [-s hostname] node_number

    Flags

    -a hostname
    Specifies the administrative host name, which can be resolved to an IP address, associated with the extension nodes's network interface on the administrative network. This flag is required when adding a new extension node.

    -c string
    Specifies the Simple Network Management Protocol (SNMP) community name that the SP SNMP manager and the node's SNMP agent will send in the corresponding field of the SNMP messages. This field consists of 1 to 255 ASCII characters. If the -c flag is not specified, the spmgrd daemon will use a default SNMP community name. For more information about the default community name, refer to the related extension node publication in the "Related Information" section that follows.

    -h
    Displays usage information.

    -i string
    Specifies the extension node identifier assigned to the node in its system's administrative environment. This is a text string that uniquely identifies the node to its system. This field consists of 1 to 255 ASCII characters. This flag is required when adding a new extension node.

    -r
    Specifies that the extension node will be reconfigured.

    -s hostname
    Specifies the host name that can be resolved to an IP address of the extension node's SNMP agent. This flag is required when adding a new extension node.

    Operands

    node_number
    Specifies the node number for this extension node. The node_number specified in this command must be for an unused standard node position that corresponds to the relative node position assigned to the extension node. Otherwise, there would be a conflict in the switch configuration information. This operand is required.

    Description

    Use this command to define extension node information in the SDR. When adding a new extension node, the -a, -i, and -s flags and the node_number operand are required. When changing an existing extension node definition, only the node number is required along with the flag corresponding to the field being changed.

    You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:

    smit enter_extnode
    

    Environment Variables

    The SP_NAME environment variable is used (if set) to direct this command to a system partition. If the SP_NAME environment variable is not set, the default system partition will be used.

    Standard Output

    This command writes informational messages to standard output.

    Standard Error

    This command writes all error messages to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred and the extension node information was not updated.

    Security

    You must have root privilege to run this command or be a member of the system group.

    Restrictions

    This command can only be issued on the control workstation.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.basic file set.

    Prerequisite Information

    IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment

    Location

    /usr/lpp/ssp/bin/endefnode

    Related Information

    Commands: enadmin, endefadapter, enrmnode, enrmadapter

    Refer to the SP Switch Router Adapter Guide for information about attaching an IP router extension node to the SP Switch.

    Examples

    1. The following example shows a definition of an extension node with a node number of 2 that references slot number 13 in a router:
      endefnode -i 13 -a router1 -s router1 -c spenmgmt 2
      

    2. The following example shows a definition of an extension node with a node number of 7 that references slot number 02 in a router. This extension node will also be reconfigured after the SDR is updated.
      endefnode -i 02 -a grf.pok.ibm.com -s grf.pok.ibm.com -c spenmgmt -r 7
      

    enrmadapter

    Purpose

    enrmadapter - Removes configuration data for an extension node adapter from the System Data Repository (SDR).

    Syntax

    enrmadapter [-h] node_number

    Flags

    -h
    Displays usage information.

    Operands

    node_number
    Specifies the node number for this extension node adapter.

    Description

    Use this command to remove extension node adapter information from the SDR. The node_number operand is required.

    You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:

    smit delete_extadapter
    

    Environment Variables

    The environment variable SP_NAME is used (if set) to direct this command to a system partition. If the SP_NAME environment variable is not set, the default system partition will be used.

    Standard Output

    This command writes informational messages to standard output.

    Standard Error

    This command writes all error messages to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred and the extension node adapter information was not updated.

    Security

    You must have root privilege to run this command or be a member of the system group.

    Restrictions

    This command can only be issued on the control workstation.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.basic file set.

    Prerequisite Information

    IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment

    Location

    /usr/lpp/ssp/bin/enrmadapter

    Related Information

    Commands: enadmin, endefadapter, endefnode, enrmnode

    Examples

    To remove an extension node adapter with a node number of 12 from the SDR, enter:

    enrmadapter 12
    

    enrmnode

    Purpose

    enrmnode - Removes configuration data for an extension node in the System Data Repository (SDR).

    Syntax

    enrmnode [-h] [-r] node_number

    Flags

    -h
    Displays usage information.

    -r
    Causes the extension node to be reset.

    Operands

    node_number
    Specifies the node number for this extension node.

    Description

    Use this command to remove extension node information from the SDR. When removing information, the node_number operand is required.

    You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:

    smit delete_extnode
    

    Environment Variables

    The environment variable SP_NAME is used (if set) to direct this command to a system partition. If the SP_NAME environment variable is not set, the default system partition will be used.

    Standard Output

    This command writes informational messages to standard output.

    Standard Error

    This command writes all error messages to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred and the extension node information was not updated.

    Security

    You must have root privilege to run this command or be a member of the system group.

    Restrictions

    This command can only be issued on the control workstation.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.basic file set.

    Prerequisite Information

    IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment

    Location

    /usr/lpp/ssp/bin/enrmnode

    Related Information

    Commands: enadmin, endefadapter, endefnode, enrmadapter

    Examples

    To remove an extension node with a node number of 2 from the SDR and reset that extension node, enter:

    enrmnode -r 2
    

    Eprimary

    Purpose

    Eprimary - Assigns or queries the switch primary node and switch primary backup node for a system partition.

    ***High Performance Switch***

    Syntax

    Eprimary [-h] [-init] [node_identifier]

    Flags

    -h
    Displays usage information.

    -init
    Initializes or reinitializes the system partition object in the System Data Repository (SDR). If -init is specified without a node identifier, the lowest numbered node in the system partition is used by default.

    Operands

    node_identifier
    Specifies the node designated as the switch primary node. It can be a host name, an IP address, a frame,slot pair, or a node number.

    Note: If no flags or operands are specified, the current switch primary node is displayed.

    Description

    Use this command to assign, change, or query the switch primary node. When the -init option is specified, it can be used to create a switch partition object for a system partition. The primary node should not be changed unless the current primary node is becoming unavailable (for example, if the current primary node is to be serviced). The Estart command must be issued before a change of the primary node (using Eprimary) takes effect. The old primary node must be rebooted or powered off before issuing Estart to remove its inclination to behave as the primary node.

    Security

    You must have root privilege to run this command.

    Related Information

    Commands: Eannotator, Eclock, Eduration, Efence, Equiesce, Estart, Etopology, Eunfence

    Examples

    1. To query the switch primary node, enter:
      Eprimary
      

    2. To designate a switch primary node by IP address, enter:
      Eprimary 129.33.34.1
      

    3. To designate a switch primary node by node number, enter:
      Eprimary 1
      

    4. To designate the switch primary node by host name, enter:
      Eprimary r11n01
      

    5. To create a system partition object and assign a switch primary node by a frame,slot, enter:
      Eprimary -init 1,2
      

    ***SP Switch***

    Syntax

    Eprimary [-h] [-init] [node_identifier] [-backup bnode_identifier]

    Flags

    -h
    Displays usage information.

    -init
    Initializes or reinitializes the current system partition object. If -init is specified without a node_identifier or without a bnode_identifier, the respective default is used for the primary and primary backup nodes. The lowest numbered node in the system partition is the default primary node, and the furthest node from the primary is the default primary backup node.

    -backup bnode_identifier
    Specifies the node designated as the oncoming switch primary backup node. It can be a host name, an IP address, a frame,slot pair, or a node number. If a bnode_identifier is not specified, the oncoming primary backup node is automatically selected. A dependent node cannot be selected as a primary or primary backup node.

    Operands

    node_identifier
    Specifies the node designated as the oncoming switch primary node. It can be a host name, an IP address, a frame,slot pair, or a node number. If a node_identifier is not specified, the oncoming primary node is automatically selected. A dependent node cannot be selected as a primary or primary backup node.

    Note: If no flags or operands are specified, each of the following is displayed:
    • Current switch primary node
    • Current switch primary backup node
    • Oncoming switch primary node
    • Oncoming switch primary backup node

    Description

    Use this command to assign, change, or query the switch primary node or the switch primary backup node. The primary node should not be changed unless the current primary node is becoming unavailable (for example, if the current primary node is to be serviced). The Estart command must be issued before a change of the primary node or the primary backup node (using Eprimary) takes effect.

    In an SP Switch network, the primary node takeover facility automatically handles situations (such as a node loss) for each of the primary and primary backup nodes. The primary node replaces a failing primary backup node and the primary backup node automatically takes over for the primary node if the primary node becomes unavailable. Note that the node chosen cannot be a dependent node. The primary backup node should be selected using the following guidelines:

    The Eprimary command selects a default oncoming primary or oncoming backup primary node if one is not specified. Users receive a warning in the following situations on the oncoming primary or oncoming backup primary nodes:

    Security

    You must have root privilege to run this command.

    Related Information

    Commands: Eannotator, Eclock, Eduration, Efence, Equiesce, Estart, Etopology, Eunfence, Eunpartition

    Examples

    1. To query the switch primary and primary backup nodes, enter:
      Eprimary
      

    2. To designate an oncoming switch primary node by IP address and let Eprimary select an oncoming switch primary backup node, enter:
      Eprimary 129.33.34.1
      

    3. To designate an oncoming switch primary node and an oncoming switch primary backup node by IP address, enter:
      Eprimary 129.33.34.1 -backup 129.33.34.56
      

    4. To designate an oncoming switch primary node and an oncoming switch primary backup node by host name, enter:
      Eprimary r11n01 -backup r17n02
      

    5. To create a system partition object and assign a switch primary backup node by a frame,slot, enter:
      Eprimary -init 1,2 -backup 1,6
      

    Equiesce

    Purpose

    Usage Note

    Use this command only if you have an SP Switch installed on your system.

    Equiesce - Quiesces the switch by causing the primary and primary backup nodes to shut down switch recovery and primary node takeover.

    Syntax

    Equiesce [-h]

    Flags

    -h
    Displays usage information.

    Operands

    None.

    Description

    Use this command to disable switch error recovery and primary node takeover. It is used to shut down normal switch error actions when global activities affecting nodes are performed. For example, when all nodes are shutdown or rebooted, they are fenced from the switch by the primary node.

    If the primary node is not the first node to shut down during a global shutdown or reboot of the entire system, it may fence all the other nodes including the primary backup node. Primary node takeover can also occur if the primary node is shut down and the backup node remains up. Issuing the Equiesce command before the shutdown prevents these situations from occurring.

    The Equiesce command causes the primary and primary backup nodes to shut down their recovery actions. Data still flows over the switch, but no faults are serviced and primary node takeover is disabled. Only the Eannotator, Eclock, Eprimary, Estart, and Etopology commands are functional after the Equiesce command is issued.

    Estart must be issued when the global activity is complete to reestablish switch recovery and primary node takeover.

    Security

    You must have root privilege to run this command.

    Location

    /usr/lpp/ssp/bin/Equiesce

    Related Information

    Commands: Eannotator, Eclock, Efence, Eprimary, Estart, Etopology, Eunfence, Eunpartition

    Examples

    To quiesce the switch before shutting down the system, enter:

    Equiesce
    

    Estart

    Purpose

    Estart - Starts the switch.

    Syntax

    Estart [-h] [-m]

    Flags

    -h
    Displays usage information.

    -m
    Specifies that the Emonitor daemon should be started. (See /etc/SP/Emonitor.cfg for details.)

    Operands

    None.

    Description

    Use this command to start or restart the current system partition based on its switch topology file. (Refer to the Etopology command for topology file details.) If the -m flag is specified, it will also start the Emonitor daemon to monitor nodes on the switch. Refer to the Emonitor daemon for additional information. If the Estart command is issued when the switch is already running, it causes a switch fault, and messages in flight are lost. Applications using reliable protocols on the switch, such as TCP/IP and the MPI User Space library, recover from switch faults. Applications using unreliable protocols on the switch do not recover from switch faults. For this reason, IBM suggests that you should be aware of what applications or protocols you are running before you issue the Estart command. Since the Estart command uses the SP rsh command, proper authentication and authorization to issue this command is necessary.

    SP Switch Notes:

    If you have an SP Switch installed on your system, an oncoming primary node as selected via Eprimary is established as primary during Estart. If necessary, the topology file is distributed to partition nodes during Estart. The topology file to be used is distributed to each of the standard nodes in the system partition via the SP Ethernet:

    Otherwise, the topology file is already resident on the nodes and does not need to be distributed.

    Files

    /etc/SP/expected.top.1nsb_8.0isb.0
    The standard topology file for systems with the 8-port High Performance Switch with a maximum of eight nodes.

    /etc/SP/expected.top.1nsb.0isb.0
    The standard topology file for one Node Switch Board (NSB) system or a maximum of 16 nodes.

    /etc/SP/expected.top.2nsb.0isb.0
    The standard topology file for two NSB systems or a maximum of 32 nodes.

    /etc/SP/expected.top.3nsb.0isb.0
    The standard topology file for three NSB systems or a maximum of 48 nodes.

    /etc/SP/expected.top.4nsb.0isb.0
    The standard topology file for four NSB systems or a maximum of 64 nodes.

    /etc/SP/expected.top.5nsb.0isb.0
    The standard topology file for five NSB systems or a maximum of 80 nodes.

    /etc/SP/expected.top.5nsb.4isb.0
    The standard topology file for five NSB and four Intermediate Switch Board (ISB) systems or a maximum of 80 nodes. This is an advantage-type network with a higher bisectional bandwidth.

    /etc/SP/expected.top.6nsb.4isb.0
    The standard topology file for six NSB and four ISB systems or a maximum of 96 nodes.

    /etc/SP/expected.top.7nsb.4isb.0
    The standard topology file for seven NSB and four ISB systems or a maximum of 112 nodes.

    /etc/SP/expected.top.8nsb.4isb.0
    The standard topology file for eight NSB and four ISB systems or a maximum of 128 nodes.

    /etc/SP/expected.top.1nsb_8.0isb.1
    The standard topology file for systems with an SP Switch-8 and a maximum of eight nodes.

    /etc/SP/Emonitor.cfg
    The list of nodes that the user wants monitored via the Emonitor daemon (not partition sensitive).

    /var/adm/SPlogs/css/dist_topology.log
    Contains system error messages if any occurred during the distribution of the topology file to the nodes.

    Security

    You must have root privilege to run this command.

    Related Information

    Commands: Eannotator, Eclock, Eduration, Efence, Eprimary, Equiesce, Etopology, Eunfence, Eunpartition

    Refer to IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment for details about system partition topology files.

    Examples

    1. To start the High Performance Switch, enter:
      Estart
      

    2. To start the High Performance Switch and the Emonitor daemon, enter:
      Estart -m
      

    Etopology

    Purpose

    Etopology - Stores or reads a switch topology file into or out of the System Data Repository (SDR).

    Syntax

    Etopology [-h] [-read] switch_topology_file

    Flags

    -h
    Displays usage information.

    -read
    Retrieves the current switch topology file out of the SDR and stores it in the specified switch_topology_file. If -read is not specified, the specified switch_topology_file will be stored in the SDR.

    Operands

    switch_topology_file
    Specifies the full path name of the file into which the current SDR switch topology is to be copied, or the full path name of a switch topology file to store in the SDR. A sequence number is appended to this file name when it is stored in the SDR. This is used to ensure that the appropriate topology file is distributed to the nodes of the system partition.

    Description

    Use this command to store or retrieve the switch_topology_file into or out of the SDR. The switch topology file is used by switch initialization when starting the switch for the current system partition. It is stored in the SDR and can be overridden by having a switch topology file in the /etc/SP directory named expected.top on the switch primary node.

    If you have an SP Switch installed on your system, the current topology file is copied to each node of the subject system partition during an Estart and to each targeted node for an Eunfence.

    Files

    /etc/SP/expected.top.1nsb_8.0isb.0
    The standard topology file for systems with the 8-port High Performance Switch with a maximum of eight nodes.

    /etc/SP/expected.top.1nsb.0isb.0
    The standard topology file for one Node Switch Board system or a maximum of 16 nodes.

    /etc/SP/expected.top.2nsb.0isb.0
    The standard topology file for two NSB systems or a maximum of 32 nodes.

    /etc/SP/expected.top.3nsb.0isb.0
    The standard topology file for three NSB systems or a maximum of 48 nodes.

    /etc/SP/expected.top.4nsb.0isb.0
    The standard topology file for four NSB systems or a maximum of 64 nodes.

    /etc/SP/expected.top.5nsb.0isb.0
    The standard topology file for five NSB systems or a maximum of 80 nodes.

    /etc/SP/expected.top.5nsb.4isb.0
    The standard topology file for five NSB and four Intermediate Switch Board (ISB) systems or a maximum of 80 nodes. This is an advantage-type network with a higher bisectional bandwidth.

    /etc/SP/expected.top.6nsb.4isb.0
    The standard topology file for six NSB and four ISB systems or a maximum of 96 nodes.

    /etc/SP/expected.top.7nsb.4isb.0
    The standard topology file for seven NSB and four ISB systems or a maximum of 112 nodes.

    /etc/SP/expected.top.8nsb.4isb.0
    The standard topology file for eight NSB and four ISB systems or a maximum of 128 nodes.

    /etc/SP/expected.top.1nsb_8.0isb.1
    The standard topology file for systems with an SP Switch-8 and a maximum of eight nodes.

    Security

    You must have root privilege to run this command.

    Related Information

    Commands: Eannotator, Eclock, Eduration, Efence, Eprimary, Equiesce, Estart, Eunfence, Eupartition

    Refer to the IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment for information on system partition configurations and topology files.

    Examples

    1. To store a topology file for a system with up to 96 nodes in the SDR, enter:
      Etopology /etc/SP/expected.top.6nsb.4isb.0
      

    2. To store a topology file for a system with up to 16 nodes in the SDR, enter:
      Etopology /etc/SP/expected.top.1nsb.0isb.0
      

    3. To retrieve a topology file out of the SDR and store it to a file, enter:
      Etopology -read /tmp/temporary.top
      

    Eunfence

    Purpose

    Eunfence - Adds an SP node to the current active switch network that was previously removed from the network.

    Syntax

    Eunfence [-h | [-G] node_specifier [node_specifier2] ...

    Flags

    -h
    Displays usage information.

    -G
    Unfences all valid nodes in the list of nodes regardless of system partition boundaries. If the -G flag is not used, the Eunfence command will only unfence the nodes in the current system partition. All other specified nodes will not be unfenced and a nonzero return code is returned.

    Operands

    node_specifier
    Specifies a list of nodes that is to rejoin the current switch network. It can be a list of host names, IP addresses, node numbers, frame,slot pairs, or a node group.

    Description

    Use this command to allow a node to rejoin the current switch network that was previously removed with the Efence command.

    You can also use this command to allow a node to rejoin the switch network if that node was previously removed from the SP Switch network due to a switch or adapter error.

    SP Switch Note:

    Eunfence first distributes the current topology file to the nodes before they can be unfenced.

    High Performance Switch Note:

    The Eunfence command cannot unfence a fenced node if a switch fault occurred or if Estart ran after the node was fenced. You must do another Estart to unfence the node.
    Note: If a host name or IP address is used as the node_specifier for a dependent node, it must be a host name or IP address assigned to the adapter that connects the dependent node to the SP Switch. Neither the administrative host name nor the Simple Network Management Protocol (SNMP) agent's host name for a dependent node is guaranteed to be the same as the host name of its switch network interface.

    Files

    /var/adm/SPlogs/css/dist_topology.log
    Contains system error messages if any occurred during the distribution of the topology file to the nodes.

    Security

    You must have root privilege to run this command.

    Related Information

    Commands: Eannotator, Eclock, Eduration, Efence, Eprimary, Equiesce, Estart, Etopology, Eunpartition

    Examples

    1. To unfence a node by IP address, enter:
      Eunfence 129.33.34.1
      

    2. To unfence two nodes by host name, enter:
      Eunfence r11n01 r11n04
      

    3. To unfence several nodes by node number, enter:
      Eunfence 34 43 20 76 40
      

    4. To unfence node 14 of frame 2 by frame,slots pairs, enter:
      Eunfence 2,14
      

    5. If the current system partition has nodes with node numbers 1, 2, 5, and 6 and another system partition has nodes with node numbers 3, 4, 7, and 8, issuing the command:
      Eunfence 5 6 7 8
      
      unfences nodes 5 and 6, but not nodes 7 and 8. As a result, the command returns a nonzero return code.

    6. To successfully unfence the nodes in example 5 with the same system partitions, use the -G flag as follows:
      Eunfence -G 5 6 7 8
      

    Eunpartition

    Purpose

    Usage Note

    Use this command only if you have an SP Switch installed on your system.

    Eunpartition - Prepares a system partition for merging with a neighboring system partition.

    Syntax

    Eunpartition [-h]

    Flags

    -h
    Displays usage information.

    If a flag is not specified, Eunpartition examines the SP_NAME shell variable and selects a system partition based on its current setting.

    Operands

    None.

    Description

    Use this command to prepare a partitioned configuration for a new system partition definition within an SP cluster.

    This command must be executed for each system partition prior to the spapply_config command to redefine system partitions. Since this command uses the SP rsh command, proper authentication and authorization to issue this command is required.

    If you specify Eunpartition in error, it will quiesce the primary and primary backup nodes. If this occurs, you must use Estart to restart the switch.

    Security

    You must have root privilege to run this command.

    Related Information

    Commands: Eannotator, Eclock, Eduration, Efence, Eprimary, Equiesce, Estart, Etopology, Eunfence

    Examples

    To prepare the current system partition for repartitioning as specified by SP_NAME, enter:

    Eunpartition
    

    export_clients

    Purpose

    export_clients - Creates or updates the Network File System (NFS) export list for a boot/install server.

    Syntax

    export_clients [-h]

    Flags

    -h
    Displays usage information. If the command is issued with the -h flag, the syntax description is displayed to standard output and no other action is taken.

    Operands

    None.

    Description

    Use this command to create or update the NFS export list on a boot/install server node.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    -1
    Indicates that an error occurred.

    Security

    You must have root privilege to run this command.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Location

    /usr/lpp/ssp/bin/export_clients

    Related Information

    Commands: setup_server

    Examples

    To create or update the NFS export list on a boot/install server node, enter:

    export_clients
    

    ext_srvtab

    Purpose

    ext_srvtab - Extracts service key files from the authentication database.

    Syntax

    ext_srvtab [-n] [-r realm] [instance ...]

    Flags

    -n
    If specified, the master key is obtained from the master key cache file. Otherwise, ext_srvtab prompts the user to enter the master key interactively.

    -r
    If specified, the realm fields in the extracted file match the given realm rather than the local realm.

    Operands

    instance
    Specifies an instance name. On the SP system, service instances consist of the short form of the network names for the hosts on which the service runs.

    Description

    The ext_srvtab command extracts service key files from the authentication database. The master key is used to extract service key values from the database. For each instance specified on the command line, the ext_srvtab command creates a new service key file in the current working directory with a file name of instance-new-srvtab which contains all the entries in the database with an instance field of instance. This new file contains all the keys registered for instances of services defined to run on that host. A user must have read access to the authentication database to execute this command. This command can only be issued on the system on which the authentication database resides.

    Files

    instance-new-srvtab
    Service key file generated for instance.

    /var/kerberos/database/principal.pag, /var/kerberos/database/principal.dir
    Files containing the authentication database.

    /.k
    Master key cache file.

    Related Information

    Commands: kadmin, ksrvutil

    Refer to Chapter 2, "RS/6000 SP Files and Other Technical Information" section of IBM Parallel System Support Programs for AIX: Command and Technical Reference for additional Kerberos information.

    Examples

    If a system has three network interfaces named as follows:

    ws3e.abc.org
    ws3t.abc.org
    ws3f.finet.abc.org
    
    to re-create the server key file on this workstation (that is an SP authentication server), user root could do the following:
    # create a new key file in the /tmp  directory for each instance
    # Combine the instance files into a single file for the hostname.
    # Delete temporary files and protect key file
    cd /tmp
    /usr/kerberos/etc/ext_srvtab -n ws3e ws3t ws3f
    /bin/cat ws3e-new-srvtab ws3t-new-srvtab ws3f-new-srvtab \
       >/etc/krb-srvtab
    /bin/rm ws3e-new-srvtab ws3t-new-srvtab ws3f-new-srvtab
    /bin/chmod 400 /etc/krb-srvtab
    

    fencevsd

    Purpose

    fencevsd - Prevents an application running on a node or group of nodes from accessing an IBM Virtual Shared Disk or group of IBM Virtual Shared Disks.

    Syntax

    fencevsd -v vsd_name_list -n node_list

    Flags

    -v
    Specifies one or more IBM Virtual Shared Disk names, separated by commas.

    -n
    Specifies one or more node numbers, separated by commas.

    Operands

    None.

    Description

    Under some circumstances, the system may believe a node has failed and begin recovery procedures, when the node is actually operational, but cut off from communication with other nodes running the same application. In this case, the "failed" node must not be allowed to serve requests for the IBM Virtual Shared Disks it normally serves until recovery is complete and the other nodes running the application recognize the failed node as operational. The fencevsd command prevents the failed node from filling requests for its IBM Virtual Shared Disks.

    This command can be run from any node.
    Note: This command will fail if you do not specify a current server (primary or backup) to an IBM Virtual Shared Disk with the -v flag.

    Files

    /usr/lpp/csd/bin/fencevsd
    Specifies the command file.

    Security

    You must have root privilege to run this command.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: lsfencevsd, lsvsd, unfencevsd, updatevsdtab, vsdchgserver

    Refer to IBM Parallel System Support Programs for AIX: Managing Shared Disks for information on how to use this command in writing applications.

    Examples

    To fence the IBM Virtual Shared Disks vsd1 and vsd2 from node 5, enter:

    fencevsd -v vsd1,vsd2 -n 5
    

    get_vpd

    Purpose

    get_vpd - Consolidates the Vital Product Data (VPD) files for the nodes and writes the information to a file and optionally to a diskette.

    Syntax

    get_vpd [-h] [-d] -m model_number -s serial_number

    Flags

    -h
    Displays usage information.

    -d
    Specifies that the Vital Product Data file will be written to a diskette.

    -m model_number
    Specifies the machine type model number. The value of the model number is "MMx", where MM is the class of the machine:
    20
    No switch, 2--64 nodes
    2A
    No switch, 2--8 nodes, 49 inch height
    3A
    8-port switch, 2--8 nodes, 49 inch height
    38
    8-port switch, 2--8 nodes, 79 inch height
    30
    Single-staged switching, 2--80 nodes
    40
    Dual-staged switching, 62--128 nodes

    -s serial_number
    Specifies the serial number. The value of the serial_number is "pp00sssss", where:

    pp
    Is 02 for machines built in US (Poughkeepsie) and 51 for machines built in EMEA (Montpelier).

    00
    Is a mandatory value.

    sssss
    Is the serial number of the machine.

    Description

    Use this command to consolidate the Vital Product Data (VPD) for the nodes in the RS/6000 SP into a file and to optionally write the file to diskette. The diskette created by this command is sent to IBM manufacturing when an upgrade to the RS/6000 SP hardware is desired. This diskette is used by manufacturing and marketing to configure an upgrade of the RS/6000 SP.

    The get_vpd command is issued by IBM field personnel to capture VPD information after an upgrade of the system. All installation and configuration of the RS/6000 SP must be complete prior to issuing the get_vpd command.

    Files

    /var/adm/SPlogs/SPconfig/node_number.umlc
    Files used as input to this command.

    /var/adm/SPlogs/SPconfig/serial_number.vpd
    Output file generated by this command.

    Standard Output

    This command creates the /var/adm/SPlogs/SPconfig/serial_number.vpd file and optionally writes the file to a diskette.

    Standard Error

    This command writes all error messages to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred while processing the VPD information and the command did not complete successfully.

    Security

    You must have root privilege to run this command.

    Restrictions

    This command can only be issued on the control workstation.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP) ssp.basic file set.

    Prerequisite Information

    IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment

    Location

    /usr/lpp/ssp/install/bin/get_vpd

    Examples

    1. This example shows the creation of a file containing all of the node VPD information for a model type of 204 and a serial number of 020077650. The output is written to /var/adm/SPlogs/SPconfig/020077650.vpd.
      get_vpd -m 204 -s 020077650
      

    2. This example shows the creation of a file containing all of the node VPD information for a model type of 306 and a serial number of 510077730. The output is written to /var/adm/SPlogs/SPconfig/510077730.vpd and also to diskette.
      get_vpd -m 306 -s 510077730 -d
      

    ha_vsd

    Purpose

    ha_vsd - Starts the rvsd subsystem of IBM Recoverable Virtual Shared Disk (RVSD). This includes configuring IBM Virtual Shared Disks and data striping devices (HSDs) as well as starting the rvsd and hc daemons.

    Syntax

    ha_vsd [reset]

    Flags

    None.

    Operands

    reset
    Stops and restarts the rvsd subsystem of IBM Recoverable Virtual Shared Disk by stopping the rvsd and hc subsystems and then starting them again.

    Description

    Use this command to start the IBM Recoverable Virtual Shared Disk licensed program after you install it, or, with the reset option, to stop and restart the program.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred.

    Security

    You must have root privilege to issue the ha_vsd subcommand.

    Implementation Specifics

    This command is part of the IBM Recoverable Virtual Shared Disk Licensed Program Product (LPP).

    Prerequisite Information

    See "Using the IBM Recoverable Virtual Shared Disk Software" in IBM Parallel System Support Programs for AIX: Managing Shared Disks.

    Location

    /usr/lpp/csd/bin/ha_vsd

    Related Information

    Commands: ha.vsd, hc.vsd

    Examples

    To stop the rvsd subsystem and restart it, enter:

    ha_vsd reset
    

    The system returns the messages:

    Starting rvsd subsystem.
    rvsd subsystem started PID=xxx.
    

    ha.vsd

    Purpose

    ha.vsd - Queries and controls the rvsd subsystem of IBM Recoverable Virtual Shared Disk (RVSD).

    Syntax

    ha.vsd
    {adapter_recovery [on | off] | debug [off] | mksrc | query | quorum n | qsrc | reset | reset_quorum | rmsrc | start | stop | trace [off]}

    Flags

    None.

    Operands

    adapter_recovery [on | off]
    Enables or disables communication adapter recovery. The default is on.

    The rvsd subsystem must be restarted for this operand to take effect.

    debug [off]
    Specify debug to redirect the RVSD subsystem's stdout and stderr to the console and cause the RVSD subsystem to not respawn if it exits with an error. (You can use the lscons command to determine the current console.)

    The RVSD subsystem must be restarted for this operand to take effect.

    Once debugging is turned on and the RVSD subsystem has been restarted, ha.vsd trace should be issued to turn on tracing.

    Use this operand under the direction of your IBM service representative.

    Note: the default when the node is booted is to have stdout and stderr routed to the console. If debugging is turned off stdout and stderr will be routed to /dev/null and all further trace messages will be lost. You can determine if debug has been turned on by issuing ha.vsd qsrc. If debug has been turned on the return value will be:

    action = "2"
    

    mksrc
    Uses mkssys to create the rvsd subsystem.

    query
    Displays the current status of the rvsd subsystem in detail.

    quorum n
    Sets the value of the quorum, the number of nodes that must be active to direct recovery. Usually, quorum is defined as a majority of the nodes that are defined as IBM Virtual Shared Disk nodes in a system partition, but this command allows you to override that definition. The rvsd subsystem must be in the active state when you issue this command. This is not a persistent change.

    qsrc
    Displays the System Resource Controller (SRC) configuration of the RVSD daemon.

    reset
    Stops and restarts the rvsd subsystem.

    reset_quorum
    Resets the default quorum.

    rmsrc
    Uses rmssys to remove the rvsd subsystem.

    start
    Starts the rvsd subsystem.

    stop
    Stops the rvsd subsystem.

    trace [off]
    Requests or stops tracing of the rvsd subsystem. The rvsd subsystem must be in the active state when this command is issued.

    This operand is only meaningful after the debug operand has been used to send stdout and stderr to the console and the rvsd subsystem has been restarted.

    Description

    Use this command to display information about the rvsd subsystem, to change the number of nodes needed for quorum, and to change the status of the subsystem.

    You can start the rvsd subsystem with the VSD Perspective. Type spvsd and select actions for IBM VSD nodes.

    Exit Values

    0
    Indicates the successful completion of the command.

    nonzero
    Indicates that an error occurred.

    Security

    You must have root privilege to issue the debug, quorum, refresh, reset, start, stop, trace, mksrc, and rmsrc subcommands.

    Implementation Specifics

    This command is part of the IBM Recoverable Virtual Shared Disk Licensed Program Product (LPP).

    Prerequisite Information

    See "Using the IBM Recoverable Virtual Shared Disk Software" in IBM Parallel System Support Programs for AIX: Managing Shared Disks.

    Location

    /usr/lpp/csd/bin/ha.vsd

    Related Information

    Commands: ha_vsd, hc.vsd

    Examples

    1. To stop the rvsd subsystem and restart it, enter:
      ha.vsd reset
      

      The system returns the messages:

      Waiting for the rvsd subsystem to exit.
      rvsd subsystem exited successfully.
      Starting rvsd subsystem.
      rvsd subsystem started PID=xxx.
      

    2. To change the quorum to five nodes of a 16-node SP system, enter:
      ha.vsd quorum 5
      

      The system returns the message:

      Quorum has been changed from 8 to 5.
      

    hacws_verify

    Purpose

    hacws_verify - Verifies the configuration of both the primary and backup High Availability Control Workstation (HACWS) control workstations.

    Syntax

    hacws_verify

    Flags

    None.

    Operands

    None.

    Description

    Use this command to verify that the primary and backup control workstations are properly configured to provide HACWS services to the SP system. The hacws_verify command inspects both the primary and backup control workstations to verify the following:

    Both the primary and backup control workstations must be running and capable of executing remote commands via the /usr/lpp/ssp/rcmd/bin/rsh command.

    The system administrator should run the hacws_verify command after HACWS is initially configured. After that, the hacws_verify command can be run at any time.

    Exit Values

    0
    Indicates that no problems were found with the HACWS configuration.

    nonzero
    Indicates that problems were found with the HACWS configuration.

    Prerequisite Information

    Refer to IBM Parallel System Support Programs for AIX: Administration Guide for additional information on the HACWS option.

    Location

    /usr/sbin/hacws/hacws_verify

    Related Information

    SP Commands: install_hacws, rsh, spcw_addevents

    haemcfg

    Purpose

    haemcfg - Compiles the Event Management objects in the System Data Repository (SDR) and places the compiled information into a binary Event Management Configuration Database (EMCDB) file

    Syntax

    haemcfg [-c] [-n]

    Flags

    -c
    Indicates that you want to check the data in the System Data Repository (SDR) without building the Event Management Configuration Database (EMCDB).

    -n
    Indicates that you want to build a test copy of the EMCDB in the current directory.

    Operands

    None.

    Description

    The haemcfg utility command builds the Event Management Configuration Database (EMCDB) file for a system partition. If no flags are specified, the haemcfg command:

    To place the new EMCDB into production, you must shut down and restart all of this system partition's Event Manager daemons: the daemon on the control workstation and the daemon on each of the system partition's nodes. When the Event Management daemon restarts, it copies the EMCDB from the staging directory to the production directory. The name of the production EMCDB is /etc/ha/cfg/em.syspar_name.cdb.

    If you want to test a new EMCDB, IBM recommends that you create a separate system partition for that purpose.

    You must create a distinct EMCDB file for each system partition on the IBM RS/6000 SP. To build an EMCDB file, you must be executing on the control workstation and you must set the SP_NAME environment variable to the appropriate system partition name before you issue the command.

    Before you build or replace an EMCDB, it is advisable to issue the haemcfg command with the debugging flags.

    The -c flag lets you check the validity of the Event Management data that resides in the SDR. This data was previously loaded through the haemloadcfg command. If any of the data is invalid, the command writes an error message that describes the error.

    When the -c flag is processed, the command validates the data in the SDR, but does not create a new EMCDB file and does not update the EMCDB version string in the SDR.

    The -n flag lets you build a test EMCDB file in the current directory. If anything goes wrong with the creation of the new file, the command writes an error message that describes the error.

    When the -n flag is processed, the command uses the data in the SDR to create a test EMCDB file in the current directory, but it does not update the EMCDB version string in the SDR. If any of the data in the SDR is invalid, the command stops at the first error encountered.

    If you specify both flags on the command line, the haemcfg command performs the actions of the -c flag.

    After you have checked the data and the build process, issue the haemcfg command without any flags. This builds the new EMCDB file, places it in the /spdata/sys1/ha/cfg directory, and updates the EMCDB version string in the SDR.

    Files

    /spdata/sys1/ha/cfg/em.syspar_name.cdb
    Contains the most recently compiled EMCDB file for the system partition specified by syspar_name. This file will be placed into production when all of the Event Management daemons in the system partition are next restarted.

    /etc/ha/cfg/em.syspar_name.cdb
    Contains the production EMCDB file for the system partition specified by syspar_name. This EMCDB file is currently in use by the Event Management subsystem.

    Standard Output

    When the command executes successfully, it writes the following informational messages:

    Reading Event Management data for partition syspar_name
    CDB=new_EMCDB_file_name Version=EMCDB_version_string
    

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Errors can result from causes that include:

    For a listing of the errors that the haemcfg command can produce, see IBM Parallel System Support Programs for AIX: Diagnosis and Messages Guide.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred. It is accompanied by one or more error messages that indicate the cause of the error.

    Security

    To place an EMCDB file for a system partition into the /spdata/sys1/ha/cfg directory, you must be running with an effective user ID of root on the control workstation. Before running this command, you must set the SP_NAME environment variable to the appropriate system partition name.

    Restrictions

    To place an EMCDB file for a system partition into the /spdata/sys1/ha/cfg directory, you must be running with an effective user ID of root on the control workstation. Before running this command, you must set the SP_NAME environment variable to the appropriate system partition name.

    If you run the haemcfg command without any flags, the command stops at the first error it encounters. With the -c flag on, the command continues, letting you obtain as much debugging information as possible in one pass. To reduce your debugging time, therefore, run the command with the debugging flags first.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    For a general overview of configuring Event Management, see "The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide.

    For a description of the SDR classes and attributes that are related to the EMCDB, see IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference.

    Location

    /usr/lpp/ssp/bin/haemcfg

    Related Information

    Commands: haemloadcfg

    Examples

    1. To validate the Event Management data in the System Data Repository (without creating a new EMCDB file), enter:
      haemcfg -c
      

      If there are any errors in the data, the command writes appropriate error messages.

      To fix the errors, replace the data in the SDR. For more information, see the man page for the haemloadcfg command.

    2. To create a test EMCDB file in the current directory, enter:
      haemcfg -n
      

      If there are any problems in creating the file, the command writes appropriate error messages.

    3. To compile a new EMCDB file for a system partition from the Event Management data that resides in the SDR and place it into the staging directory:

      1. Make sure you are executing with root authority on the control workstation.

      2. Make sure that the SP_NAME environment variable is set to the name of the appropriate system partition.

      3. Enter:
        haemcfg
        

        In response, the command creates a new EMCDB file, places it in the staging directory as /spdata/sys1/ha/cfg/em.syspar_name.cdb, where syspar_name is the name of the current system partition, and updates the EMCDB version string in the SDR.

    haemctrl Script

    Purpose

    haemctrl - A control script that starts the Event Management subsystem.

    Syntax

    haemctrl {-a | -s | -k | -d | -c | -u | -t | -o | -r | -h}

    Flags

    -a
    Adds the subsystem.

    -s
    Starts the subsystem.

    -k
    Stops the subsystem.

    -d
    Deletes the subsystem.

    -c
    Cleans the subsystems, that is, deletes them from all system partitions.

    -u
    Unconfigures the subsystems from all system partitions.

    -t
    Turns tracing on for the subsystem.

    -o
    Turns tracing off for the subsystem.

    -r
    Refreshes the subsystem.

    -h
    Displays usage information.

    Operands

    None.

    Description

    Event Management is a distributed subsystem of PSSP that provides a set of high availability services for the IBM RS/6000 SP. By matching information about the state of system resources with information about resource conditions that are of interest to client programs, it creates events. Client programs can use events to detect and recover from system failures, thus enhancing the availability of the SP system.

    The haemctrl control script controls the operation of the Event Management subsystem. The subsystem is under the control of the System Resource Controller (SRC) and belongs to a subsystem group called haem. Associated with each subsystem is a daemon.

    An instance of the Event Management subsystem executes on the control workstation and on every node of a system partition. Because Event Management provides its services within the scope of a system partition, its subsystem is said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It can be issued from either the control workstation or any of the system partition's nodes.

    From an operational point of view, the Event Management subsystem group is organized as follows:

    Subsystem
    Event Management

    Subsystem Group
    haem

    SRC Subsystem
    haem

    The haem subsystem is associated with the haemd daemon.

    The subsystem name on the nodes is haem. There is one of each subsystem per node and it is associated with the system partition to which the node belongs.

    On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named haem.sp_prod and haem.sp_test.

    Daemons
    haemd

    The haemd daemon provides the Event Management services.

    The haemctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

    The haemctrl script provides a variety of controls for operating the Event Management subsystem:

    Before performing any of these functions, the script obtains the current system partition name and IP address (using the spget_syspar command) and the node number (using the node_number) command. If the node number is zero, the control script is running on the control workstation.

    Except for the clean and unconfigure functions, all functions are performed within the scope of the current system partition.

    Adding the Subsystem

    When the -a flag is specified, the control script uses the mkssys command to add the Event Management subsystem to the SRC. The control script operates as follows:

    1. It makes sure that the haem subsystem is stopped.

    2. It gets the port number for the haem subsystem for this system partition from the Syspar_ports class of the System Data Repository (SDR) and ensures that the port number is set in the /etc/services file. If there is no port number in the SDR and this script is running on the control workstation, the script obtains a port number. If the script is running on a node and there is no port number in the SDR, the script ends with an error. The range of valid port numbers is 10000 to 10100, inclusive.

      The service name that is entered in the /etc/services file is haem.syspar_name.

    3. It removes the haem subsystem from the SRC (just in case it is still there).

    4. It adds the haem subsystem to the SRC. On the control workstation, the IP address of the system partition is specified to be supplied as an argument to the daemon by the mkssys command.

    5. It adds an entry for the haem group to the /etc/inittab file. The entry ensures that the group is started during boot. However, if haemctrl is running on a High Availability Control Workstation (HACWS), no entry is made in the /etc/inittab file. Instead, HACWS manages starting and stopping the group.

    6. On the control workstation, it creates the Event Management Configuration Database (EMCDB). First, it runs the haemloadcfg command to load the SDR with the Event Management configuration data that is contained in the haemloadlist file. Then, it runs the haemcfg command to compile the data in the SDR and create the binary Event Management Configuration Database. Any errors that occur are written to a log file named /var/ha/log/em.loadcfg.syspar_name.

      For more information about configuring Event Management data, see the IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference.

      Then it gets the port number for the subsystem from the SP_ports class of the System Data Repository (SDR) and ensures that the port number is set in the /etc/services file. This port number is used for remote connections to Event Management daemons that are running on the control workstation. If there is no port number in the SDR, the script obtains one and sets it in the /etc/services file. The range of valid port numbers is 10000 to 10100, inclusive.

      The service name is haemd.

    Starting the Subsystem

    When the -s flag is specified, the control script uses the startsrc command to start the Event Management subsystem, haem.

    Stopping the Subsystem

    When the -k flag is specified, the control script uses the stopsrc command to stop the Event Management subsystem, haem.

    Deleting the Subsystem

    When the -d flag is specified, the control script uses the rmssys command to remove the Event Management subsystem from the SRC. The control script operates as follows:

    1. It makes sure that the haem subsystem is stopped.

    2. It removes the haem subsystem from the SRC using the rmssys command.

    3. It removes the port number from the /etc/services file.

    4. If there are no other subsystems remaining in the haem group, it removes the entry for the haem group from the /etc/inittab file.

    Cleaning Up the Subsystems

    When the -c flag is specified, the control script stops and removes the Event Management subsystems for all system partitions from the SRC. The control script operates as follows:

    1. It stops all instances of subsystems in the subsystem group in all partitions, using the stopsrc -g haem command.

    2. It removes the entry for the haem group from the /etc/inittab file.

    3. It removes all instances of subsystems in the subsystem group in all partitions from the SRC using the rmssys command.

    4. It removes all Event Management entries from the /etc/services file. These include the port numbers for the subsystems as well as the port number used for remote connections.

    Unconfiguring the Subsystems

    When the -u flag is specified, the control script performs the function of the -c flag in all system partitions and then removes all port numbers from the SDR allocated by the Event Management subsystems.
    Note: The -u flag is effective only on the control workstation.

    Prior to executing the haemctrl command with the -u flag on the control workstation, the haemctrl command with the -c flag must be executed from all of the nodes. If this subsystem is not successfully cleaned from all of the nodes, different port numbers may be used by this subsystem, leading to undefined behavior.

    Turning Tracing On

    When the -t flag is specified, the control script turns tracing on for the haemd daemon, using the haemtrcon command.

    Turning Tracing Off

    When the -o flag is specified, the control script turns tracing off for the haemd daemon, using the haemtrcoff command.

    Refreshing the Subsystem

    The -r flag has no effect for this subsystem.

    Logging

    While it is running, the Event Management daemon normally provides information about its operation and errors by writing entries to the AIX error log. If it cannot, errors are written to a log file called /var/ha/log/em.default.syspar_name.

    Files

    /var/ha/log/em.default.syspar_name
    Contains the default log of the haemd daemon on the system partition named syspar_name.

    /var/ha/log/em.loadcfg.syspar_name
    Contains a log of any errors that occurred while creating the Event Management Configuration Database for the system partition named syspar_name using the haemcfg command.

    /var/ha/log/em.trace.syspar_name
    Contains the trace log of the haemd daemon on the system partition named syspar_name.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred.

    Security

    You must be running with an effective user ID of root.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    "The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide

    IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/haemctrl

    Related Information

    Commands: haemcfg, haemd, haemloadcfg, haemtrcoff, haemtrcon, lssrc, startsrc, stopsrc, syspar_ctrl

    Examples

    1. To add the Event Management subsystem to the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      haemctrl -a
      

    2. To start the Event Management subsystem in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      haemctrl -s
      

    3. To stop the Event Management subsystem in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      haemctrl -k
      

    4. To delete the Event Management subsystem from the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      haemctrl -d
      

    5. To clean up the Event Management subsystem on all system partitions, enter:
      haemctrl -c
      

    6. To unconfigure the Event Management subsystem from all system partitions, on the control workstation, enter:
      haemctrl -u
      

    7. To turn tracing on for the Event Management daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      haemctrl -t
      

    8. To turn tracing off for the Event Management daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      haemctrl -o
      

    9. To display the status of all of the subsystems in the Event Management SRC group, enter:
      lssrc -g haem
      

    10. To display the status of an individual Event Management subsystem on a node, enter:
      lssrc -s haem
      

      To display the status of an individual Event Management subsystem on the control workstation, enter:

      lssrc -s haem.
      syspar_name
      

      where syspar_name is the system partition name.

    11. To display detailed status about an individual Event Management subsystem on a node, enter:
      lssrc -l -s haem
      

      To display detailed status about an individual Event Management subsystem on the control workstation, enter:

      lssrc -l -s haem.syspar_name
      

      where syspar_name is the system partition name.

      In response, the system returns information that includes the running status of the subsystem, the settings of trace flags, the version number of the Event Management Configuration Database, the time the subsystem was started, the connection status to Group Services and peer Event Management subsystem, and the connection status to Event Management clients, if any.

    12. To display the status of all of the daemons under SRC control, enter:
      lssrc -a
      

    haemd Daemon

    Purpose

    haemd - The Event Manager daemon, which observes resource variable instances that are updated by Resource Monitors and generates and reports events to client programs

    Syntax

    haemd [syspar_IPaddr]

    Flags

    None.

    Operands

    syspar_IPaddr
    Specifies the IP address of the system partition in which the haemd daemon is to execute. If the daemon is executing on the control workstation, this argument must be specified. Otherwise, the argument is ignored, if present.

    Description

    The haemd daemon is the Event Manager daemon. The daemon observes resource variable instances that are updated by Resource Monitors and generates and reports events to client programs.

    One instance of the haemd daemon executes on the control workstation for each system partition. An instance of the haemd daemon also executes on every node of a system partition. The haemd daemon is under System Resource Controller (SRC) control.

    Because the daemon is under SRC control, it cannot be started directly from the command line. It is normally started by the haemctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system. If you must start or stop the daemon directly, use the haemctrl command.

    For more information about the Event Manager daemon, see the haemctrl man page.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    "The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide

    IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/haemd

    Related Information

    Commands: haemctrl

    Examples

    See the haemctrl command.

    haemloadcfg

    Purpose

    haemloadcfg - Loads Event Management configuration data into the System Data Repository (SDR)

    Syntax

    haemloadcfg [-d] [-r] loadlist_file

    Flags

    -d
    Deletes objects from the SDR that match objects in the load list file.

    -r
    Replaces objects in the SDR by matching objects in the load list file. Any unmatched objects in the load list file are added to the SDR.

    Operands

    loadlist_file
    The name of the file that contains the Event Management configuration data to be loaded into the SDR. To load the default PSSP configuration data, specify /usr/lpp/ssp/install/config/haemloadlist.

    Description

    The haemloadcfg utility command loads Event Management configuration data into the SDR. Note that before you invoke haemloadcfg, you must ensure that the SP_NAME environment variable is set to the appropriate system partition name.

    The configuration data is contained in a load list file, whose format is described by the man page for the haemloadlist file. For details on the SDR classes and attributes that you can use to specify Event Management configuration data, see IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference.

    To load the default Event Management configuration data for PSSP, specify the load list file as /usr/lpp/ssp/install/config/haemloadlist.

    To add Event Management configuration data for other Resource Monitors, create a file in load list format and specify its name on the command.

    Without any flags, the haemloadcfg command does not replace existing objects in the SDR. The data in the load list file is matched with the existing objects in the SDR based on key attributes, as follows:

    SDR Class
    Key Attributes

    EM_Resource_Variable
    rvName

    EM_Instance_Vector
    ivResource_name, ivElement_name

    EM_Structured_Byte_String
    sbsVariable_name, sbsField_name

    EM_Resource_Class
    rcClass

    EM_Resource_Monitor
    rmName

    Note that the way in which the haemloadcfg command handles existing SDR objects is different from the way in which the SDRCreateObjects command handles them. The SDRCreateObjects command creates a new object as long as the attributes, taken as a group, are unique.

    To change a nonkey attribute of an Event Management object that already exists in the SDR, change the attribute in the load list file. Then run the haemloadcfg command using the -r flag and the name of the load list file. All objects in the SDR are replaced by matching objects in the load list file using the key attributes to match. Any unmatched objects in the load list file are added to the SDR.

    To delete Event Management objects from the SDR, create a load list file with the objects to be deleted. Only the key attributes need to be specified. Then run the haemloadcfg command using the -d flag and the name of the load list file. All objects in the SDR that match objects in the load list file are deleted. No unmatched objects, if any in the load list file, are added to the SDR.

    Under any circumstances, duplicate objects in the load list file, based on matches in key attributes, are ignored. However, such duplicate objects are written to standard output.

    Files

    /usr/lpp/ssp/install/config/haemloadlist
    Contains the default configuration data for the Event Management subsystem.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred. It is accompanied by one or more error messages that indicate the cause of the error.

    Security

    You must have the appropriate authority to write to the SDR. You should be running on the control workstation. Before running this command, you must set the SP_NAME environment variable to the appropriate system partition name.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    For a general overview of configuring Event Management, see "The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide.

    For details on the System Data Repository classes and attributes for Event Management configuration Database, see IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference.

    Location

    /usr/lpp/ssp/bin/haemloadcfg

    Related Information

    Commands: haemcfg, SDRCreateObjects, SDRDeleteObjects

    Files: haemloadlist

    Also, for a description of the SDR classes for Event Management configuration data, see IBM Parallel System Support Programs for AIX: Event Management Programming Guide and Reference.

    Examples

    1. To load PSSP's default Event Management configuration data into the SDR, enter:
      haemloadcfg /usr/lpp/ssp/install/config/haemloadlist
      

    2. To load Event Management configuration data for a new Resource Monitor that is contained in a file called /usr/local/config/newrmloadlist, enter:
      haemloadcfg /usr/local/config/newrmloadlist
      

      If nonkey attributes in this load list file are later changed, update the SDR by entering:

      haemloadcfg -r /usr/local/config/newrmloadlist
      

      If this new Resource Monitor is no longer needed, its configuration data is removed from the SDR by entering:

      haemloadcfg -d /usr/local/config/newrmloadlist
      

    haemtrcoff

    Purpose

    haemtrcoff - Turns tracing off for the Event Manager daemon.

    Syntax

    haemtrcoff -s subsys_name -a trace_list

    Flags

    -s subsys_name
    Specifies the name of the Event Management subsystem. On a node of a system partition, this is haem. On the control workstation, this is haem.syspar_name, where syspar_name is the name of the system partition for which you want to specify the subsystem. This argument must be specified.

    -a trace_list
    Specifies a list of trace arguments. Each argument specifies the type of activity for which tracing is to be turned off. At least one argument must be specified. If more than one argument is specified, the arguments must be separated by commas. The list may not include blanks.

    Operands

    The following trace arguments may be specified:

    init
    Stops tracing the initialization of the Event Manager daemon.

    config
    Stops dumping information from the configuration file.

    insts
    Stops tracing resource variable instances that are handled by the daemon.

    rmctrl
    Stops tracing Resource Monitor control.

    cci
    Stops tracing the client communication (internal) interface.

    emp
    Stops tracing the event manager protocol.

    obsv
    Stops tracing resource variable observations.

    evgn
    Stops tracing event generation and notification.

    reg
    Stops tracing event registration and unregistration.

    pci
    Stops tracing the peer communication (internal) interface.

    msgs
    Stops tracing all messages that come to and are issued from the daemon.

    query
    Stops tracing queries that are handled by the daemon.

    gsi
    Stops tracing the Group Services (internal) interface.

    eval
    Stops tracing predicate evaluation.

    rdi
    Stops tracing the reliable daemon (internal) interface.

    bli
    Stops tracing the back level (internal) interface, used for handling nodes that are running a level of PSSP that is earlier than PSSP 2.2.

    all
    Stops tracing all activities.

    all_but_msgs
    Stops tracing all activities except for messages. Message activity is defined by the msgs argument.

    Description

    The haemtrcoff command is used to turn tracing off for specified activities of the Event Manager daemon. Trace output is placed in an Event Management trace log for the system partition.

    Use this command only under the direction of the IBM Support Center. It provides information for debugging purposes and may degrade the performance of the Event Management subsystem or anything else that is running in the system partition. Do not use this command during normal operation.

    Files

    /var/ha/log/em.trace.syspar_name
    Contains the trace log of the haemd daemon on the system partition named syspar_name.

    /var/ha/log/em.msgtrace.syspar_name
    Contains message trace output from the Event Manager daemon on the system partition named syspar_name.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    "The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide

    Location

    /usr/lpp/ssp/bin/haemtrcoff

    Related Information

    Commands: haemctrl, haemd, haemtrcon

    Examples

    In the following examples, the SP system has two system partitions named sp_prod and sp_test. The instances of the Event Management subsystem on the control workstation of the SP are named haem.sp_prod and haem.sp_test, respectively. The instance of the Event Management subsystem that runs on any node of either system partition is named haem.

    1. To turn off all tracing for the Event Management subsystem on the control workstation for the sp_prod system partition, login to the control workstation and enter:
      haemtrcoff -s haem.sp_prod -a all
      

    2. To turn off all tracing for the Event Management subsystem on one of the nodes of the sp_test system partition, login to the node and enter:
      haemtrcoff -s haem -a all
      

    3. To turn off all tracing of initialization and configuration for the Event Management subsystem on the control workstation for the sp_test system partition, login to the control workstation and enter:
      haemtrcoff -s haem.sp_test -a init,config
      

    haemtrcon

    Purpose

    haemtrcon - Turns tracing on for the Event Manager daemon.

    Syntax

    haemtrcon -s subsys_name -a trace_list

    Flags

    -s subsys_name
    Specifies the name of the Event Management subsystem. On a node of a system partition, this is haem. On the control workstation, this is haem.syspar_name, where syspar_name is the name of the system partition for which you want to specify the subsystem. This argument must be specified.

    -a trace_list
    Specifies a list of trace arguments. Each argument specifies the type of activity for which tracing is to be turned on. At least one argument must be specified. If more than one argument is specified, the arguments must be separated by commas. The list may not include blanks.

    Operands

    The following trace arguments may be specified:

    init
    Traces the initialization of the Event Manager daemon.

    config
    Dumps information from the configuration file.

    insts
    Traces resource variable instances that are handled by the daemon.

    rmctrl
    Traces Resource Monitor control.

    cci
    Traces the client communication (internal) interface.

    emp
    Traces the event manager protocol.

    obsv
    Traces resource variable observations.

    evgn
    Traces event generation and notification.

    reg
    Traces event registration and unregistration.

    pci
    Traces the peer communication (internal) interface.

    msgs
    Traces all messages that come to and are issued from the daemon.

    query
    Traces queries that are handled by the daemon.

    gsi
    Traces the Group Services (internal) interface.

    eval
    Traces predicate evaluation.

    rdi
    Traces the reliable daemon (internal) interface.

    bli
    Traces the back level (internal) interface, used for handling nodes that are running a level of PSSP that is earlier than PSSP 2.2.

    all
    Traces all activities.

    all_but_msgs
    Traces all activities except for messages. Message activity is defined by the msgs argument.

    regs
    Traces currently registered events.

    dinsts
    Traces all resource variable instances known to the daemon.

    Description

    The haemtrcon command is used to turn tracing on for specified activities of the Event Manager daemon. Trace output is placed in an Event Management trace log for the system partition. When used, the regs and dinsts arguments perform a one-time trace. The specified information is placed in the trace log, but no further tracing is done.

    Use this command only under the direction of the IBM Support Center. It provides information for debugging purposes and may degrade the performance of the Event Management subsystem or anything else that is running in the system partition. Do not use this command to turn tracing on during normal operation.

    Files

    /var/ha/log/em.trace.syspar_name
    Contains the trace log of the haemd daemon on the system partition named syspar_name.

    /var/ha/log/em.msgtrace.syspar_name
    Contains message trace output from the Event Manager daemon on the system partition named syspar_name.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    "The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide

    Location

    /usr/lpp/ssp/bin/haemtrcon

    Related Information

    Commands: haemctrl, haemd, haemtrcoff

    Examples

    In the following examples, the SP system has two system partitions named sp_prod and sp_test. The instances of the Event Management subsystem on the control workstation of the SP are named haem.sp_prod and haem.sp_test, respectively. The instance of the Event Management subsystem that runs on any node of either system partition is named haem.

    1. To turn on all tracing for the Event Management subsystem on the control workstation for the sp_prod system partition, login to the control workstation and enter:
      haemtrcon -s haem.sp_prod -a all
      

    2. To turn on all tracing for the Event Management subsystem on one of the nodes of the sp_test system partition, login to the node and enter:
      haemtrcon -s haem -a all
      

    3. To turn on all tracing of initialization and configuration for the Event Management subsystem on the control workstation for the sp_test system partition, login to the control workstation and enter:
      haemtrcon -s haem.sp_test -a init,config
      

    haemunlkrm

    Purpose

    haemunlkrm - Unlocks and starts a Resource Monitor.

    Syntax

    haemunlkrm -s subsys_name -a resmon_name

    Flags

    -s subsys_name
    Specifies the name of the Event Management subsystem. On a node of a system partition, this is haem. On the control workstation, this is haem.syspar_name, where syspar_name is the name of the system partition for which you want to specify the subsystem. This argument must be specified.

    -a resmon_name
    Specifies the name of the Resource Monitor to unlock and start.

    Description

    If the Event Manager daemon cannot successfully start a Resource Monitor after three attempts within a two hour interval, the Resource Monitor is "locked" and no further attempts are made to start it. Once the cause of the failure is determined and the problem corrected, the haemunlkrm command can be used to unlock the Resource Monitor and attempt to start it.

    The status of the Event Manager daemon, as displayed by the lssrc command, indicates if a Resource Monitor is locked.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    "The Event Management Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide

    Location

    /usr/lpp/ssp/bin/haemunlkrm

    Examples

    If the output of the lssrc command indicates that the hardware Resource Monitor IBM.PSSP.hmrmd is locked, then after correcting the condition that prevented the Resource Monitor from being started, enter:

    haemunlkrm -s haem -a IBM.PSSP.hmrmd
    
    Note: This example applies to unlocking a Resource Monitor on a node.

    hagsctrl Script

    Purpose

    hagsctrl - A control script that starts the Group Services subsystems.

    Syntax

    hagsctrl {-a | -s | -k | -d | -c | -u | -t | -o | -r | -h}

    Flags

    -a
    Adds the subsystems.

    -s
    Starts the subsystems.

    -k
    Stops the subsystems.

    -d
    Deletes the subsystems.

    -c
    Cleans the subsystems, that is, delete them from all system partitions.

    -u
    Unconfigures the subsystems from all system partitions.

    -t
    Turns tracing on for the subsystems.

    -o
    Turns tracing off for the subsystems.

    -r
    Refreshes the subsystem.

    -h
    Displays usage information.

    Operands

    None.

    Description

    Group Services provides distributed coordination and synchronization services for other distributed subsystems running on a set of nodes on the IBM RS/6000 SP. The hagsctrl control script controls the operation of the subsystems that are required for Group Services. These subsystems are under the control of the System Resource Controller (SRC) and belong to a subsystem group called hags. Associated with each subsystem is a daemon.

    An instance of the Group Services subsystem executes on the control workstation and on every node of a system partition. Because Group Services provides its services within the scope of a system partition, its subsystems are said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It can be issued from either the control workstation or any of the system partition's nodes.

    From an operational point of view, the Group Services subsystem group is organized as follows:

    Subsystem
    Group Services

    Subsystem Group
    hags

    SRC Subsystems
    hags and hagsglsm

    The hags subsystem is associated with the hagsd daemon. The hagsglsm subsystem is associated with the hagsglsmd daemon.

    The subsystem names on the nodes are hags and hagsglsm. There is one of each subsystem per node and it is associated with the system partition to which the node belongs.

    On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named hags.sp_prod, hags.sp_test, hagsglsm.sp_prod, and hagsglsm.sp_test.

    Daemons
    hagsd and hagsglsmd

    The hagsd daemon provides the majority of the Group Services functions.

    The hagsglsmd daemon provides global synchronization services for the switch adapter membership group.

    The hagsctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

    The hagsctrl script provides a variety of controls for operating the Group Services subsystems:

    Before performing any of these functions, the script obtains the current system partition name (using the spget_syspar command) and the node number (using the node_number) command. If the node number is zero, the control script is running on the control workstation.

    Except for the clean and unconfigure functions, all functions are performed within the scope of the current system partition.

    Adding the Subsystem

    When the -a flag is specified, the control script uses the mkssys command to add the Group Services subsystems to the SRC. The control script operates as follows:

    1. It makes sure that both the hags and hagsglsm subsystems are stopped.

    2. It gets the port number for the hags subsystem for this system partition from the Syspar_ports class of the System Data Repository (SDR) and ensures that the port number is set in the /etc/services file. If there is no port number in the SDR and this script is running on the control workstation, the script obtains a port number. If the script is running on a node and there is no port number in the SDR, the script ends with an error. The range of valid port numbers is 10000 to 10100, inclusive.

      The service name that is entered in the /etc/services file is hags.syspar_name.

    3. It removes the hags and hagsglsm subsystems from the SRC (just in case they are still there).

    4. It adds the hags and hagsglsm subsystems to the SRC. The system partition name is configured as a daemon parameter on the mkssys command.

    5. It adds an entry for the hags group to the /etc/inittab file. The entry ensures that the group is started during boot. However, if hagsctrl is running on a High Availability Control Workstation (HACWS), no entry is made in the /etc/inittab file. Instead, HACWS manages starting and stopping the group.

    Starting the Subsystem

    When the -s flag is specified, the control script uses the startsrc command to start the Group Services subsystems, hags and hagsglsm.

    Stopping the Subsystem

    When the -k flag is specified, the control script uses the stopsrc command to stop the Group Services subsystems, hags and hagsglsm.

    Deleting the Subsystem

    When the -d flag is specified, the control script uses the rmssys command to remove the Group Services subsystems from the SRC. The control script operates as follows:

    1. It makes sure that both the hags and hagsglsm subsystems are stopped.

    2. It removes the hags and hagsglsm subsystems from the SRC using the rmssys command.

    3. It removes the port number from the /etc/services file.

    4. If there are no other subsystems remaining in the hags group, it removes the entry for the hags group from the /etc/inittab file.

    Cleaning Up the Subsystems

    When the -c flag is specified, the control script stops and removes the Group Services subsystems for all system partitions from the SRC. The control script operates as follows:

    1. It stops all instances of subsystems in the subsystem group in all partitions, using the stopsrc -g hags command.

    2. It removes the entry for the hags group from the /etc/inittab file.

    3. It removes all instances of subsystems in the subsystem group in all partitions from the SRC using the rmssys command.

    Unconfiguring the Subsystems

    When the -u flag is specified, the control script performs the function of the -c flag in all system partitions and then removes all port numbers from the SDR allocated by the Group Services subsystems.
    Note: The -u flag is effective only on the control workstation.

    Prior to executing the hagsctrl command with the -u flag on the control workstation, the hagsctrl command with the -c flag must be executed from all of the nodes. If this subsystem is not successfully cleaned from all of the nodes, different port numbers may be used by this subsystem, leading to undefined behavior.

    Turning Tracing On

    When the -t flag is specified, the control script turns tracing on for the hagsd daemon, using the traceson command. Tracing is not available for the hagsglsmd daemon.

    Turning Tracing Off

    When the -o flag is specified, the control script turns tracing off (returns it to its default level) for the hagsd daemon, using the tracesoff command. Tracing is not available for the hagsglsmd daemon.

    Refreshing the Subsystem

    The -r flag has no effect for this subsystem.

    Logging

    While they are running, the Group Services daemons provide information about their operation and errors by writing entries in a log file in the /var/ha/log directory.

    Each daemon limits the log size to a pre-established number of lines (by default, 5,000 lines). When the limit is reached, the daemon appends the string .bak to the name of the current log file and begins a new log. If a .bak version already exists, it is removed before the current log is renamed.

    Files

    /var/ha/log/hags_nodenum_instnum.syspar_name
    Contains the log of the hagsd daemons on the nodes.

    /var/ha/log/hags.syspar_name_nodenum_instnum.syspar_name
    Contains the log of each hagsd daemon on the control workstation.

    /var/ha/log/hagsglsm_nodenum_instnum.syspar_name
    Contains the log of the hagsglsmd daemons on the nodes.

    /var/ha/log/hagsglsm.syspar_name_nodenum_instnum.syspar_name
    Contains the log of each hagsglsmd daemon on the control workstation.

    The file names include the following variables:

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred.

    Security

    You must be running with an effective user ID of root.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    "The Group Services Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide

    IBM Parallel System Support Programs for AIX: Group Services Programming Guide and Reference

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/hagsctrl

    Related Information

    Commands: hagsd, hagsglsmd, lssrc, startsrc, stopsrc, syspar_ctrl

    Examples

    1. To add the Group Services subsystems to the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hagsctrl -a
      

    2. To start the Group Services subsystems in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hagsctrl -s
      

    3. To stop the Group Services subsystems in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hagsctrl -k
      

    4. To delete the Group Services subsystems from the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hagsctrl -d
      

    5. To clean up the Group Services subsystems on all system partitions, enter:
      hagsctrl -c
      

    6. To unconfigure the Group Services subsystem from all system partitions, on the control workstation, enter:
      hagsctrl -u
      

    7. To turn tracing on for the Group Services daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter: enter:
      hagsctrl -t
      

    8. To turn tracing off for the Group Services daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter: enter:
      hagsctrl -o
      

    9. To display the status of all of the subsystems in the Group Services SRC group, enter:
      lssrc -g hags
      

    10. To display the status of an individual Group Services subsystem, enter:
      lssrc -s subsystem_name
      

    11. To display detailed status about an individual Group Services subsystem, enter:
      lssrc -l -s subsystem_name
      

      In response, the system returns information that includes the running status of the subsystem, the number and identity of connected GS clients, information about the Group Services domain, and the number of providers and subscribers in established groups.

    12. To display the status of all of the daemons under SRC control, enter:
      lssrc -a
      

    hagsd Daemon

    Purpose

    hagsd - A Group Services daemon that provides a general purpose facility for coordinating and monitoring changes to the state of an application that is running on a set of nodes.

    Syntax

    hagsd daemon_name

    Flags

    None.

    Operands

    daemon_name
    Specifies the name used by the daemon to name log files and identify its messages in the error log.

    Description

    The hagsd daemon is part of the Group Services subsystem, which provides a general purpose facility for coordinating and monitoring changes to the state of an application that is running on a set of nodes. This daemon provides most of the services of the subsystem.

    One instance of the hagsd daemon executes on the control workstation for each system partition. An instance of the hagsd daemon also executes on every node of a system partition. The hagsd daemon is under System Resource Controller (SRC) control.

    Because the daemon is under SRC control, it is better not to start it directly from the command line. It is normally called by the hagsctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system. If you must start or stop the daemon directly, use the startsrc or stopsrc command.

    For more information about the Group Services daemons, see the hagsctrl man page.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    "The Group Services Subsystem" chapter of IBM Parallel System Support Programs for AIX: Administration Guide

    IBM Parallel System Support Programs for AIX: Group Services Programming Guide and Reference

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/hagsd

    Related Information

    Commands: hagsctrl, hagsglsmd

    Examples

    See the hagsctrl command.

    hagsglsmd Daemon

    Purpose

    hagsglsmd - A Group Services daemon that provides global synchronization services for the switch adapter membership group.

    Syntax

    hagsglsmd daemon_name

    Flags

    None.

    Operands

    daemon_name
    Specifies the name used by the daemon to name log files and identify its messages in the error log.

    Description

    The hagsglsmd daemon is part of the Group Services subsystem, which provides a general purpose facility for coordinating and monitoring changes to the state of an application that is running on a set of nodes. This daemon provides global synchronization services for the High Performance Switch adapter membership group.

    One instance of the hagsglsmd daemon executes on the control workstation for each system partition. An instance of the hagsglsmd daemon also executes on every node of a system partition. The hagsglsmd daemon is under System Resource Controller (SRC) control.

    Because the daemon is under SRC control, it is better not to start it directly from the command line. It is normally called by the hagsctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system. If you must start or stop the daemon directly, use the startsrc or stopsrc command.

    For more information about the Group Services daemons, see the hagsctrl man page.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    "The Group Services Subsystem" chapter of IBM Parallel System Support Programs for AIX: Group Services Programming Guide and Reference

    IBM Parallel System Support Programs for AIX: Group Services Programming Guide and Reference

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/hagsglsmd

    Related Information

    Commands: hagsctrl, hagsd

    Examples

    See the hagsctrl command.

    hardmon Daemon

    Purpose

    hardmon - Monitors and controls the state of the SP hardware.

    Syntax

    hardmon [-B] [-r poll_rate] [-d debug_flag] ...

    Flags

    -B
    Executes the daemon in diagnostic mode.

    -r poll_rate
    Specifies the rate, in seconds, at which the daemon polls each frame for state information.

    -d debug_flag
    Specifies the daemon debug flag to be set in the daemon. Refer to the hmadm command for possible values of debug_flag. Multiple -d debug flags can be specified.

    Operands

    None.

    Description

    hardmon is the Hardware Monitor daemon. The daemon monitors and controls the state of the SP hardware contained in one or more SP frames. This command is not normally executed from the command line. Access to the Hardware Monitor is provided by the hmmon, hmcmds, spmon, s1term, and nodecond commands. Control of the Hardware Monitor daemon is provided by the hmadm command. These commands are the Hardware Monitor "client" commands.

    The Hardware Monitor daemon executes on the Monitor and Control Node (MACN). The MACN is that IBM RS/6000 workstation to which the RS-232 lines are connected to the frames. The MACN is one and the same as the control workstation. The daemon is managed by the System Resource Controller (SRC). When the MACN is booted, an entry in /etc/inittab invokes the startsrc command to start the daemon. The daemon is configured in the SRC to be restarted automatically if it terminates for any reason other than the stopsrc command. The SRC subsystem name for the Hardware Monitor daemon is hardmon.

    hardmon obtains configuration information from the System Data Repository (SDR). The SP_ports object class specifies the port number that the daemon is to use to accept TCP/IP connections from the client commands. The port number is obtained from the object whose daemon attribute value matches hardmon and whose host_name attribute value matches the host name of the workstation on which the daemon is executing. There must be one hardmon object in SP_ports for the MACN. The Frame object class contains an object for each frame in the SP system.

    The attributes of interest to the daemon are frame_number, tty, and MACN. When started, the daemon fetches all those objects in the Frame class whose MACN attribute value matches the host name of the workstation on which the daemon is executing. For each frame discovered in this manner, the daemon saves the frame number and opens the corresponding tty device. When all frames have been configured, the daemon begins to poll the frames for state information. Current state and changed state can then be obtained using the hmmon and spmon commands. The hmcmds and spmon commands can be used to control the hardware within the frames.

    The daemon also reads the file /spdata/sys1/spmon/hmthresholds for values used to check boundary conditions for certain state variables. This file should only be changed on request from IBM support. Finally, the /spdata/sys1/spmon/hmacls file is read for Access Control List (ACL) information. Refer to the hmadm command and the /spdata/sys1/spmon/hmacls file for more information on ACLs.

    All errors detected by the Hardware Monitor daemon are written to the AIX error log.

    The flags in the SRC subsystem object for the hardmon subsystem should not normally be changed. For example, if the poll rate is more than 5 seconds, the nodecond command can fail with unpredictable results. Upon request from IBM support for more information to aid in problem determination, debug flags can be set using the hmadm command.

    If the High Availability Control Workstation (HACWS) Frame Supervisor (type 20) or the SEPBU HACWS Frame Supervisor (type 22) is installed in the SP frames, the -B flag is used to run the Hardware Monitor daemon in diagnostic mode. This diagnostic mode is used to validate that the frame ID written into the Supervisor matches the frame ID configured in the SDR for that frame. Normally, the frame ID is automatically written into the Supervisor during system installation. The frame ID is written into the frame to detect cabling problems in an HACWS configuration. In a non-HACWS SP configuration, the -B flag is useful whenever the RS232 cables between the frames and MACN are changed (but only if one or more frames contain a type 20 or type 22 supervisor). The hardmon command can be executed directly from the command line with the -B flag, but only after the currently running daemon is stopped using the stopsrc command. Diagnostic messages are written to the AIX error log. The daemon exits when all frames are validated.

    Frame ID validation is also performed every time the daemon is started by the System Resource Controller. Any frame that has a frame ID mismatch can be monitored, but any control commands to the frame are ignored until the condition is corrected. A frame with a mismatch is noted in the System Monitor Graphical User Interface as well as in the AIX error log. The hmcmds command can be used to set the currently configured frame ID into a type 20 or type 22 supervisor after it is verified that the frame is correctly connected to the MACN.

    Additional Configuration Information: The Hardware Monitor subsystem also obtains information from the system partition and the Syspar_map object classes in the SDR. While this information is not used by the hardmon daemon itself, it is used by the hardmon client commands listed under Related Information. Each of these commands executes in the environment of one system partition. If the SP system is not partitioned, these commands execute in the environment of the entire system. In any case, the Syspar_map object class is used to determine which nodes are contained in the current environment. The attributes of interest are syspar_name and node_number.

    Starting and Stopping the hardmon Daemon

    The hardmon daemon is under System Resource Controller (SRC) control. It uses the signal method of communication in SRC. The hardmon daemon is a single subsystem and not associated with any SRC group. The subsystem name is hardmon. In order to start the hardmon daemon, use the startsrc -s hardmon command. This starts the daemon with the default arguments and SRC options. The hardmon daemon is setup to be respawnable and be the only instance of the hardmon daemon running on a particular node or control workstation. Do not start the hardmon daemon from the command line without using the startsrc command to start it.

    To stop the hardmon daemon, use the stopsrc -s hardmon command. This stops the daemon and does not allow it to respawn.

    To display the status of the hardmon daemon, use the lssrc -s hardmon command.

    If the default startup arguments need to be changed, use the chssys command to change the startup arguments or the SRC options. Refer to AIX Version 4 Commands Reference and AIX Version 4 General Programming Concepts: Writing and Debugging Programs for more information about daemons under SRC control and how to modify daemon arguments when under SRC.

    To view the current SRC options and daemon arguments, use the odmget -q 'subsysname=hardmon' SRCsubsys command.

    Files

    /usr/lpp/ssp/bin/hardmon
    Contains the hardmon command.

    /spdata/sys1/spmon/hmthresholds
    Contains boundary values.

    /spdata/sys1/spmon/hmacls
    Contains Access Control Lists.

    Related Information

    Commands: hmadm, hmcmds, hmmon, nodecond, spmon, s1term

    File: /spdata/sys1/spmon/hmacls

    Examples

    1. To start the hardmon daemon, enter:
      startsrc -s hardmon
      

    2. To stop the hardmon daemon, enter:
      stopsrc -s hardmon
      

    3. To display the status of the hardmon daemon, enter:
      lssrc -s hardmon
      

    4. To display the status of all the daemons under SRC control, enter:
      lssrc -a
      

    5. To display the current SRC options and daemon arguments for the hardmon daemon, enter:
      odmget -q 'subsysname=hardmon' SRCsubsys
      

    hats Script

    Purpose

    hats - Starts or restarts Topology Services on a node or on the control workstation.

    Syntax

    hats

    Flags

    None.

    Operands

    None.

    Description

    Use this command to start the operation of Topology Services for a system partition (the hatsd daemon) on the control workstation or on a node within a system partition.

    The hats script is not normally executed from the command line. It is normally called by the hatsctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

    The Topology Services subsystem provides internal services to PSSP components.

    Note that the hats script issues the no -o nonlocsrcroute=1 command, which enables IP source routing. Do not change this setting, because the Topology Services subsystem requires this setting to work properly. If you change the setting, the Topology Services subsystem and a number of other subsystems that depend on it will no longer operate properly.

    The hatsd daemon is initially started on the control workstation with the System Resource Controller (SRC), regardless of the level of the system partition. It is respawned automatically if the hatsd daemon fails. The SP_NAME environment variable causes selection of the correct topology configuration.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    The "Starting Up and Shutting Down the SP System" chapter and "The System Data Repository" appendix in IBM Parallel System Support Programs for AIX: Administration Guide

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/hats

    Related Information

    Commands: hatsctrl, lssrc, startsrc, stopsrc, syspar_ctrl

    Examples

    See the hatsctrl command.

    hatsctrl Script

    Purpose

    hatsctrl - A control script that starts the Topology Services subsystem.

    Syntax

    hatsctrl {-a | -s | -k | -d | -c | -u | -t | -o | -r | -h}

    Flags

    -a
    Adds the subsystem.

    -s
    Starts the subsystem.

    -k
    Stops the subsystem.

    -d
    Deletes the subsystem.

    -c
    Cleans the subsystems, that is, delete them from all system partitions.

    -u
    Unconfigures the subsystems from all system partitions.

    -t
    Turns tracing on for the subsystem.

    -o
    Turns tracing off for the subsystem.

    -r
    Refreshes the subsystem.

    -h
    Displays usage information.

    Operands

    None.

    Description

    Topology Services is a distributed subsystem of PSSP that provides information to other PSSP subsystems about the state of the nodes and adapters on the IBM RS/6000 SP.

    The hatsctrl control script controls the operation of the Topology Services subsystem. The subsystem is under the control of the System Resource Controller (SRC) and belongs to a subsystem group called hats. Associated with each subsystem is a daemon and a script that configures and starts the daemon.

    An instance of the Topology Services subsystem executes on the control workstation and on every node of a system partition. Because Topology Services provides its services within the scope of a system partition, its subsystem is said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It can be issued from either the control workstation or any of the system partition's nodes.

    From an operational point of view, the Topology Services subsystem group is organized as follows:

    Subsystem
    Topology Services

    Subsystem Group
    hats

    SRC Subsystem
    hats

    The hats subsystem is associated with the hatsd daemon and the hats script. The hats script configures and starts the hatsd daemon.

    The subsystem name on the nodes is hats. There is one of each subsystem per node and it is associated with the system partition to which the node belongs.

    On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named hats.sp_prod and hats.sp_test.

    Daemons
    hatsd

    The hatsd daemon provides the Topology Services. The hats script configures and starts the hatsd daemon.

    The hatsctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

    The hatsctrl script provides a variety of controls for operating the Topology Services subsystem:

    Before performing any of these functions, the script obtains the current system partition name and IP address (using the spget_syspar command) and the node number (using the node_number) command. If the node number is zero, the control script is running on the control workstation.

    Except for the clean and unconfigure functions, all functions are performed within the scope of the current system partition.

    Adding the Subsystem

    When the -a flag is specified, the control script uses the mkssys command to add the Topology Services subsystem to the SRC. The control script operates as follows:

    1. It makes sure that the hats subsystem is stopped.

    2. It gets the port number for the hats subsystem for this system partition from the Syspar_ports class of the System Data Repository (SDR) and ensures that the port number is set in the /etc/services file. If there is no port number in the SDR and this script is running on the control workstation, the script obtains a port number. If the script is running on a node and there is no port number in the SDR, the script ends with an error. The range of valid port numbers is 10000 to 10100, inclusive.

      The service name that is entered in the /etc/services file is hats.syspar_name.

    3. It checks to see if the subsystem is already configured in the SDR. If not, it creates an instance of the TS_Config class for this subsystem with default values. The default values are:

    4. It removes the hats subsystem from the SRC (just in case it is still there).

    5. It adds the hats subsystem to the SRC. On the control workstation, the IP address of the system partition is specified to be supplied as an argument to the daemon by the mkssys command.

    6. It adds an entry for the hats group to the /etc/inittab file. The entry ensures that the group is started during boot. However, if hatsctrl is running on a High Availability Control Workstation (HACWS), no entry is made in the /etc/inittab file. Instead, HACWS manages starting and stopping the group.

    Starting the Subsystem

    When the -s flag is specified, the control script uses the startsrc command to start the Topology Services subsystem, hats.

    Stopping the Subsystem

    When the -k flag is specified, the control script uses the stopsrc command to stop the Topology Services subsystem, hats.

    Deleting the Subsystem

    When the -d flag is specified, the control script uses the rmssys command to remove the Topology Services subsystem from the SRC. The control script operates as follows:

    1. It makes sure that the hats subsystem is stopped.

    2. It removes the hats subsystem from the SRC using the rmssys command.

    3. It removes the port number from the /etc/services file.

    4. If there are no other subsystems remaining in the hats group, it removes the entry for the hats group from the /etc/inittab file.

    Cleaning Up the Subsystems

    When the -c flag is specified, the control script stops and removes the Topology Services subsystems for all system partitions from the SRC. The control script operates as follows:

    1. It stops all instances of subsystems in the subsystem group in all partitions, using the stopsrc -g hats command.

    2. It removes the entry for the hats group from the /etc/inittab file.

    3. It removes all instances of subsystems in the subsystem group in all partitions from the SRC using the rmssys command.

    4. It removes all entries for the hats subsystems from the /etc/services file.

    Unconfiguring the Subsystems

    When the -u flag is specified, the control script performs the function of the -c flag in all system partitions and then removes all port numbers from the SDR allocated by the Topology Services subsystems.
    Note: The -u flag is effective only on the control workstation.

    Prior to executing the hatsctrl command with the -u flag on the control workstation, the hatsctrl command with the -c flag must be executed from all of the nodes. If this subsystem is not successfully cleaned from all of the nodes, different port numbers may be used by this subsystem, leading to undefined behavior.

    Turning Tracing On

    When the -t flag is specified, the control script turns tracing on for the hatsd daemon, using the traceson command.

    Turning Tracing Off

    When the -o flag is specified, the control script turns tracing off (returns it to its default level) for the hatsd daemon, using the tracesoff command.

    Refreshing the Subsystem

    When the -r flag is specified, the control script refreshes the subsystem, using the hats refresh command and the refresh command.

    It rebuilds the information about the node and adapter configuration in the SDR and signals the daemon to read the rebuilt information.

    Logging

    While it is running, the Topology Services daemon provides information about its operation and errors by writing entries in a log file. The hatsd daemon in the system partition named syspar_name uses a log file called /var/ha/log/hats.syspar_name.

    Files

    /var/ha/log/hats.syspar_name.
    Contains the log of the hatsd daemon on the system partition named syspar_name.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred.

    Security

    You must be running with an effective user ID of root.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/hatsctrl

    Related Information

    Commands: hats, lssrc, startsrc, stopsrc, syspar_ctrl

    Examples

    1. To add the Topology Services subsystem to the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hatsctrl -a
      

    2. To start the Topology Services subsystem in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hatsctrl -s
      

    3. To stop the Topology Services subsystem in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hatsctrl -k
      

    4. To delete the Topology Services subsystem from the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hatsctrl -d
      

    5. To clean up the Topology Services subsystem on all system partitions, enter:
      hatsctrl -c
      

    6. To unconfigure the Topology Services subsystem from all system partitions, on the control workstation, enter:
      hatsctrl -u
      

    7. To turn tracing on for the Topology Services daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hatsctrl -t
      

    8. To turn tracing off for the Topology Services daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hatsctrl -o
      

    9. To display the status of all of the subsystems in the Topology Services SRC group, enter:
      lssrc -g hats
      

    10. To display the status of an individual Topology Services subsystem, enter:
      lssrc -s subsystem_name
      

    11. To display detailed status about an individual Topology Services subsystem, enter:
      lssrc -l -s subsystem_name
      

      In response, the system returns information that includes the running status of the subsystem, the number of defined and active nodes, the required number of active nodes for a quorum, the status of the group of nodes, and the IP addresses of the source node, the group leader, and the control workstation.

    12. To display the status of all of the daemons under SRC control, enter:
      lssrc -a
      

    hb Script

    Purpose

    hb - Starts or restarts heartbeat services on a node or on the control workstation.

    Syntax

    hb [-spname syspar_name] [-splevel pssp_level]

    { [start | resume] | [stop | quiesce] | reset |

    [query | qall | qsrc] | refresh | mksrc optional_flags | rmsrc restore |

    [debug | debug off] | [trace on | trace off] }

    Flags

    -spname syspar_name
    Executes the command for the system partition specified by the syspar_name operand. If this flag is not specified, the name of the system partition given by the value of the SP_NAME variable is used.

    -splevel pssp_level
    Sets the system partition level to the value specified by the pssp_level operand. Valid levels are: PSSP-2.1, PSSP-2.2, PSSP-2.3, or PSSP-2.4. The default level is PSSP-2.4.

    Operands

    start | resume
    Resumes normal heartbeat services after they have been temporarily suspended with quiesce or stop.

    stop | quiesce
    Stops heartbeat services (hbd).

    reset
    Stops and restarts the heartbeat server on a node. Use this parameter:

    1. After changing relevant node information in the System Data Repository (SDR). (Reset all nodes on the affected system partition and that partition's hbd on the control workstation.)

    2. When host_responds consistently does not agree with the state of the node and automatic recovery has not taken place.

    query
    Queries the daemon for a partition. The response to a query includes heartbeat-specific information.

    qall
    Performs the query function for each defined partition.

    qsrc
    Displays a subsystem definition for a partition.

    refresh
    Uses the refresh command to request a daemon refresh.

    mksrc optional_flags
    Uses the mkssys command to create an SRC subsystem object. Additional flags for the command may be specified.

    rmsrc
    Uses the rmssys command to remove an SRC subsystem object.

    restore
    Synchronizes the running daemons with the information in the System Data Repository (SDR). This operand removes all entries for the subsystem, creates new entries based on information in the SDR, and starts the subsystems.

    [debug | debug off]
    Turns debugging on or off.

    [trace on | trace off]
    Turns additional tracing on or off.

    Description

    Use this command to control the operation of heartbeat services for a system partition (the hbd daemon) on the control workstation or on a node within a system partition.

    The hb script is not normally executed from the command line. It is normally called by the hbctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

    The heartbeat server provides input to the host_responds function within a system partition for the System Monitor through the System Monitor hr daemons. It also provides input to the IBM Recoverable Virtual Shared Disk daemons, if that product is installed on the nodes. This involves the following daemons:

    hbd
    The internal heartbeat server on the nodes and the control workstation.

    hrd
    The host responds daemon on the control workstation.

    Note: The hrd daemon is controlled by the hr script. The hbd daemon is controlled with this script.

    The hbd daemon is initially started on the control workstation with the System Resource Controller (SRC), regardless of the level of the system partition. It is respawned automatically if the hbd daemon fails. The SP_NAME environment variable causes selection of the correct heartbeat daemon.

    The hbd daemons communicate with their counterparts on other nodes over the SP Ethernet. The udp heartbeat entry in /etc/services on all nodes must specify the same port number.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    The "Starting Up and Shutting Down the SP System" chapter and "The System Data Repository" appendix in IBM Parallel System Support Programs for AIX: Administration Guide

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/hb

    Related Information

    Commands: hbctrl, lssrc, startsrc, stopsrc, syspar_ctrl

    Examples

    See the hbctrl command.

    hbctrl Script

    Purpose

    hbctrl - A control script that starts the Heartbeat subsystem.

    Syntax

    hbctrl { -a | -s | -k | -d | -c | -t | -o | -r | -h }

    Flags

    -a
    Adds the subsystem.

    -s
    Starts the subsystem.

    -k
    Stops the subsystem.

    -d
    Deletes the subsystem.

    -c
    Cleans the subsystems, that is, delete them from all system partitions.

    -t
    Turns tracing on for the subsystem.

    -o
    Turns tracing off for the subsystem.

    -r
    Refreshes the subsystem.

    -h
    Displays usage information.

    Operands

    None.

    Description

    The Heartbeat subsystem communicates with several PSSP subsystems as part of providing information about the state of the nodes and adapters on the IBM RS/6000 SP.

    The hbctrl control script controls the operation of the Heartbeat subsystem. The subsystem is under the control of the System Resource Controller (SRC) and belongs to a subsystem group called hb. Associated with each subsystem is a daemon and a script that configures and starts the daemon.

    An instance of the Heartbeat subsystem executes on the control workstation and on every node of a system partition. Because Heartbeat provides its services within the scope of a system partition, its subsystem is said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. It can be issued from either the control workstation or any of the system partition's nodes.

    From an operational point of view, the Heartbeat subsystem group is organized as follows:

    Subsystem
    Heartbeat

    Subsystem Group
    hb

    SRC Subsystem
    hb

    The hb subsystem is associated with the hbd daemon and the hb script. The hb script configures and starts the hbd daemon.

    The subsystem name on the nodes is hb. There is one of each subsystem per node and it is associated with the system partition to which the node belongs.

    On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named hb.sp_prod and hb.sp_test.

    Daemons
    hbd

    The hbd daemon provides the Heartbeat services. The hb script configures and starts the hbd daemon.

    The hbctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

    The hbctrl script provides a variety of controls for operating the Heartbeat subsystem:

    Before performing any of these functions, the script obtains the current system partition name and IP address (using the spget_syspar command) and the node number (using the node_number) command. If the node number is zero, the control script is running on the control workstation.

    Except for the clean function, all functions are performed within the scope of the current system partition.

    Adding the Subsystem

    When the -a flag is specified, the control script uses the mkssys command to add the Heartbeat subsystem to the SRC. The control script operates as follows:

    1. It makes sure that the hb subsystem is stopped.

    2. It gets the port number for the hb subsystem for this system partition from the SP_ports class of the System Data Repository (SDR) and ensures that the port number is set in the /etc/services file. If there is no port number in the SDR and this script is running on the control workstation, the script obtains a port number. If the script is running on a node and there is no port number in the SDR, the script ends with an error. The Heartbeat subsystem uses port number 4893.

      The service name that is entered in the /etc/services file is heartbeat.

    3. If this script is running on the control workstation, it checks to see if the subsystem is already configured in the SDR. If not, it creates an instance of the HB_Config class for this subsystem with default values. The default values are:

    4. It invokes the hb script with the mksrc parameter to add the subsystem to the SRC. On the control workstation, the name of the system partition is also specified on the hb script.

    5. It adds an entry for the hb group to the /etc/inittab file. The entry ensures that the group is started during boot. However, if hbctrl is running on a High Availability Control Workstation (HACWS), no entry is made in the /etc/inittab file. Instead, HACWS manages starting and stopping the group.

    Starting the Subsystem

    When the -s flag is specified, the control script uses the hb command to start the Heartbeat subsystem, hb.

    Stopping the Subsystem

    When the -k flag is specified, the control script uses the hb command to stop the Heartbeat subsystem, hb.

    Deleting the Subsystem

    When the -d flag is specified, the control script uses the rmssys command to remove the Heartbeat subsystem from the SRC. The control script operates as follows:

    1. It removes the hb subsystem from the SRC using the hb script with the rmsrc parameter.

    2. It removes the port number from the /etc/services file.

    3. If there are no other subsystems remaining in the hb group, it removes the entry for the hb group from the /etc/inittab file.

    Cleaning Up the Subsystems

    When the -c flag is specified, the control script stops and removes the Heartbeat subsystems for all system partitions from the SRC. The control script operates as follows:

    1. It stops all instances of subsystems in the subsystem group in all partitions, using the stopsrc -g hb command.

    2. It removes the entry for the hb group from the /etc/inittab file.

    3. It removes all instances of subsystems in the subsystem group in all partitions from the SRC using the rmssys command.

    Turning Tracing On

    When the -t flag is specified, the control script turns tracing on for the hbd daemon, using the traceson command.

    Turning Tracing Off

    When the -o flag is specified, the control script turns tracing off (returns it to its default level) for the hbd daemon, using the tracesoff command.

    Refreshing the Subsystem

    When the -r flag is specified, the control script refreshes the subsystem, using the hb refresh command.

    Logging

    While it is running, the Heartbeat daemon provides information about its operation and errors by writing entries in a log file. The hbd daemon in the system partition named syspar_name uses a log file called /var/ha/log/hb.syspar_name.

    Files

    /var/ha/log/hb.syspar_name.
    Contains the log of the hbd daemon on the system partition named syspar_name.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred.

    Security

    You must be running with an effective user ID of root.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/hbctrl

    Related Information

    Commands: hb, lssrc, startsrc, stopsrc, syspar_ctrl

    Examples

    1. To add the Heartbeat subsystem to the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hbctrl -a
      

    2. To start the Heartbeat subsystem in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hbctrl -s
      

    3. To stop the Heartbeat subsystem in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hbctrl -k
      

    4. To delete the Heartbeat subsystem from the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hbctrl -d
      

    5. To clean up the Heartbeat subsystem on all system partitions, enter:

      hbctrl -c
      

    6. To turn tracing on for the Heartbeat daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hbctrl -t
      

    7. To turn tracing off for the Heartbeat daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hbctrl -o
      

    8. To display the status of all of the subsystems in the Heartbeat SRC group, enter:
      lssrc -g hb
      

    9. To display the status of an individual Heartbeat subsystem, enter:
      lssrc -s subsystem_name
      

    10. To display detailed status about an individual Heartbeat subsystem, enter:
      lssrc -l -s subsystem_name
      

      In response, the system returns information that includes the running status of the subsystem, the number of defined and active nodes, the required number of active nodes for a quorum, the status of the group of nodes, the frequency and sensitivity values in use for the subsystem, and the IP addresses of the source node, the group leader, and the control workstation.

    11. To display the status of all of the daemons under SRC control, enter:
      lssrc -a
      

    hc.vsd

    Purpose

    hc.vsd - Queries and controls the hc subsystem of IBM Recoverable Virtual Shared Disk.

    Syntax

    hc.vsd
    {CLIENT_PATH socket_path | debug [off] | mksrc | PING_DELAY delay_in_sec | query | qsrc | reset | rmsrc | SCRIPT_PATH de/activate_path | start | stop | trace [off]}

    Flags

    None.

    Operands

    CLIENT_PATH socket_path
    Specifies the path for the socket connection to the hc client. The default is /tmp/serv.

    debug [off]
    Specify debug to redirect the hc subsystem's stdout and stderr to the console and cause the hc subsystem to not respawn if it exits with an error. (You can use the lscons command to determine the current console.)

    The hc subsystem must be restarted for this operand to take effect.

    Once debugging is turned on and the hc subsystem has been restarted, hc.vsd trace should be issued to turn on tracing.

    Use this operand under the direction of your IBM service representative.

    Note: the default when the node is booted is to have stdout and stderr routed to the console. If debugging is turned off stdout and stderr will be routed to /dev/null and all further trace messages will be lost. You can determine if debug has been turned on by issuing hc.vsd qsrc. If debug has been turned on the return value will be:

    action = "2"
    

    mksrc
    Uses mkssys to create the hc subsystem.

    PING_DELAY delay_in_sec
    Specifies the time in seconds between pings to the hc client. The default is 600 seconds.

    query
    Displays the current status of the hc subsystem in detail.

    qsrc
    Displays the System Resource Controller (SRC) configuration of the HC daemon.

    reset
    Stops and restarts the hc subsystem.

    rmsrc
    Uses rmssys to remove the hc subsystem.

    SCRIPT_PATH de/activate_path
    Specifies the location of the user-supplied scripts to be run when hc activates or deactivates.

    start
    Starts the hc subsystem.

    stop
    Stops the hc subsystem.

    trace [off]
    Requests or stops tracing of the hc subsystem. The hc subsystem must be in the active state when this command is issued.

    This operand is only meaningful after the debug operand has been used to send stdout and stderr to the console and the hc subsystem has been restarted.

    Description

    Use this command to display information about the hc subsystem and to change the status of the subsystem.

    You can restart the hc subsystem with the VSD Perspective. Type spvsd and select actions for IBM VSD nodes.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred.

    Note: The query and qsrc subcommands have no exit values.

    Security

    You must have root privilege to issue the debug, mksrc, reset, start, and stop commands.

    Implementation Specifics

    This command is part of the IBM Recoverable Virtual Shared Disk Licensed Program Product (LPP).

    Prerequisite Information

    See "Using the IBM Recoverable Virtual Shared Disk Software" in IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Location

    /usr/lpp/csd/bin/hc.vsd

    Related Information

    Commands: ha_vsd, ha.vsd

    Examples

    To stop the hc subsystem and restart it, enter:

    hc.vsd reset
    

    The system returns the messages:

    Waiting for the hc subsystem to exit.
    hc subsystem exited successfully.
    Starting hc subsystem.
    hc subsystem started PID=xxx.
    

    hmadm

    Purpose

    hmadm - Administers the Hardware Monitor daemon.

    Syntax

    hmadm [ {-d debug_flag} ... ] operation

    Flags

    -d debug_flag
    Specifies the daemon debug flag to be set or unset in the daemon.

    Operands

    operation
    Specifies the administrative action to perform.

    The operation must be one of the following:

    cleard
    Unsets the daemon debug flag specified by the -d flag in the daemon. Multiple -d flags can be specified. If no -d flags are specified, the all debug flag is assumed.

    clog
    Changes the daemon log file. If the log file is growing large, this operation is used to cause the daemon to write to a new log file.

    quit
    Causes the daemon to exit.

    setacls
    Reads the Hardware Monitor access control list configuration file to update the daemon's internal ACL tables. Any Hardware Monitor application or command executing under the ID of a user who has changed or deleted ACLs has its client connection terminated by the daemon. Such applications and commands must be restarted, if possible. ACLs for new users can be added without any effect on executing applications and commands.

    This operation must by invoked by the administrator after the administrator modifies the ACL configuration file.

    setd
    Sets the daemon debug flag specified by the -d flag in the daemon. Multiple -d flags can be specified. If no -d flags are specified, the all debug flag is assumed.

    Description

    The hmadm command is used to administer the Hardware Monitor daemon. The Hardware Monitor daemon executes on the control workstation and is used to monitor and control the SP hardware. Five administrative actions are supported, as specified by the operation operand.

    Normally when the daemon exits, it is automatically restarted by the system. If frame configuration information is changed, the quit operation can be used to update the system.

    The daemon writes debug information and certain error information to its log file. The log file is located in /var/adm/SPlogs/spmon and its name is of the form hmlogfile.nnn, where nnn is the Julian date of the day the log file was opened by the daemon. The clog operation causes the daemon to close its current log file and create a new one using the name hmlogfilennn, where nnn is the current Julian date. If this name already exists, a name of the form hmlogfile.nnn_m is used, where m is a number picked to create a unique file name.

    There are 15 debug flags supported by the daemon:

    all
    Sets/unsets all of the following flags.

    acls
    Logs the Access Control Lists.

    cmdq
    Logs the contents of the internal queue of commands sent to the frames.

    cntrs
    Logs the daemon internal counters.

    dcmds
    Logs commands sent to the daemon.

    fcmds
    Logs commands sent to the frames.

    ipl
    Logs interested party lists.

    pckts
    Logs packets received from the frames in /var/adm/SPlogs/spmon/hm_frame_packet_dump.

    polla
    Logs poll list array.

    rsps
    Logs responses sent to clients in /var/adm/SPlogs/spmon/hm_response_dump.

    socb
    Logs client socket session information.

    s1data
    Logs data sent to S1 serial ports in /var/adm/SPlogs/spmon/hm_s1data_dump.

    s1refs
    Logs S1 serial port reference counts and connections.

    ttycb
    Logs ttycb control blocks.

    tvars
    Logs boundary values used in checking temperatures, amperages, and volts.

    This command uses the SP Hardware Monitor. Therefore, the user must be authorized to access the Hardware Monitor subsystem and must have administrative permission. Since the Hardware Monitor subsystem uses SP authentication services, the user must execute the kinit command prior to executing this command. Alternatively, site-specific procedures can be used to obtain the tokens that are otherwise obtained by kinit.

    Files

    /usr/lpp/ssp/bin/hmadm
    Contains the hmadm command.

    Related Information

    File: /spdata/sys1/spmon/hmacls

    hmcmds

    Purpose

    hmcmds - Controls the state of the SP hardware.

    Syntax

    hmcmds
    [-a | -v] [-f file_name] [-u microcode_file_name]
     
    [-G] command [slot_spec ... | all]

    Flags

    -a
    Exits immediately after sending the VFOP command to the specified hardware; that is, it does not wait for the hardware state to match the command.

    -v
    Specifies verbose mode. The percentage of hardware components whose state matches the VFOP command is displayed at five-second intervals. The following are also displayed:

    -f file_name
    Uses file_name as the source of slot ID specifications.

    -u microcode_file_name
    Uses microcode_file_name as the source of supervisor microcode that is loaded to the specified slot_spec. If the microcode_file_name is not fully qualified, the file must be in the current directory. This option is allowed only with the microcode command.

    -G
    Specifies Global mode. With this flag, commands can be sent to any hardware.

    Operands

    command
    Specifies the command to send to the hardware components.

    slot_spec
    Specifies the addresses of the hardware components.

    Description

    Use this command to control the state of the SP hardware. Control is provided via the Virtual Front Operator Panel (VFOP). VFOP is a set of commands that can be sent to the hardware components contained in one or more SP frames. Each frame consists of 18 slots, numbered 0 through 17, where slot 0 represents the frame itself, slot 17 can contain a switch and slots 1 through 16 can contain thin or wide processing nodes. Wide nodes occupy two slots and are addressed by the odd slot number. In a switch only frame, slots 1 through 16 can contain switches; the switches occupy two slots and are addressed by the even slot number.

    Normally, commands are only sent to the hardware components in the current system partition. A system partition only contains processing nodes. The switches and the frames themselves are not contained in any system partition. To send VFOP commands to hardware components not in the current system partition or to any frame or switch, use the -G flag.

    The following list describes the VFOP command set. Commands that require the -G flag are marked by an asterisk (*). Commands marked by a double asterisk (**) are primarily used by the Eclock command and are not intended for general use since an in-depth knowledge of switch clock topology is required to execute these commands in the proper sequence.

    Before issuing these commands, refer to the "Using a Switch" chapter in the IBM Parallel System Support Programs for AIX: Administration Guide for detailed descriptions.

    High Performance Switch

    extclk1
    Sets the High Performance Switch clock multiplexor to the External Clock 1.**

    extclk2
    Sets the High Performance Switch clock multiplexor to the External Clock 2.**

    extclk3
    Sets the High Performance Switch clock multiplexor to the External Clock 3.**

    intclk
    Sets the High Performance Switch clock multiplexer to the Internal Clock.**

    SP Switch

    clkdrv2
    Sets the SP Switch clock drive to the Phase Lock Loop 2.**

    clkdrv3
    Sets the SP Switch clock drive to the Phase Lock Loop 3.**

    clkdrv4
    Sets the SP Switch clock drive to the Phase Lock Loop 4.**

    clkdrv5
    Sets the SP Switch clock drive to the Phase Lock Loop 5.**

    hold_power_reset
    Performs power-on reset of SP Switch and holds the SP Switch in reset state. Requires rel_power_reset to release.**

    hold_synch_reset
    Performs synchronous reset of SP Switch and holds the SP Switch in reset state. Requires rel_synch_reset to release.**

    intclk2
    Sets the SP Switch clock input to the Local Oscillator 2.**

    intclk4
    Sets the SP Switch clock input to the Local Oscillator 4.**

    jack3
    Sets the SP Switch clock input to the External Jack 3.**

    jack4
    Sets the SP Switch clock input to the External Jack 4.**

    jack5
    Sets the SP Switch clock input to the External Jack 5.**

    jack6
    Sets the SP Switch clock input to the External Jack 6.**

    jack7
    Sets the SP Switch clock input to the External Jack 7.**

    jack8
    Sets the SP Switch clock input to the External Jack 8.**

    jack9
    Sets the SP Switch clock input to the External Jack 9.**

    jack10
    Sets the SP Switch clock input to the External Jack 1.**

    jack11
    Sets the SP Switch clock input to the External Jack 11.**

    jack12
    Sets the SP Switch clock input to the External Jack 12.**

    jack13
    Sets the SP Switch clock input to the External Jack 13.**

    jack14
    Sets the SP Switch clock input to the External Jack 14.**

    jack15
    Sets the SP Switch clock input to the External Jack 15.**

    jack16
    Sets the SP Switch clock input to the External Jack 16.**

    jack17
    Sets the SP Switch clock input to the External Jack 17.**

    jack18
    Sets the SP Switch clock input to the External Jack 18.**

    jack19
    Sets the SP Switch clock input to the External Jack 19.**

    jack20
    Sets the SP Switch clock input to the External Jack 20.**

    jack21
    Sets the SP Switch clock input to the External Jack 21.**

    jack22
    Sets the SP Switch clock input to the External Jack 22.**

    jack23
    Sets the SP Switch clock input to the External Jack 23.**

    jack24
    Sets the SP Switch clock input to the External Jack 24.**

    jack25
    Sets the SP Switch clock input to the External Jack 25.**

    jack26
    Sets the SP Switch clock input to the External Jack 26.**

    jack27
    Sets the SP Switch clock input to the External Jack 27.**

    jack28
    Sets the SP Switch clock input to the External Jack 28.**

    jack29
    Sets the SP Switch clock input to the External Jack 29.**

    jack30
    Sets the SP Switch clock input to the External Jack 30.**

    jack31
    Sets the SP Switch clock input to the External Jack 31.**

    jack32
    Sets the SP Switch clock input to the External Jack 32.**

    jack33
    Sets the SP Switch clock input to the External Jack 33.**

    jack34
    Sets the SP Switch clock input to the External Jack 34.**

    power_on_reset
    Performs power-on reset of SP Switch. Includes chip self-test and synchronous reset.**

    rel_power_reset
    Releases SP Switch from hold_power_reset state.**

    rel_synch_reset
    Releases SP Switch from hold_synch_reset state.**

    synch_reset
    Performs synchronous reset of SP Switch. Turns off error enables and clears errors.**

    Any Frame, Node, or Switch that Supports Microcode Download

    basecode
    Performs a power off of the node and switches the active frame, node, or switch supervisor to basecode mode causing the active supervisor to become nonactive and the basecode supervisor to become active.*
    Note: You must issue this command before issuing the microcode command.

    boot_supervisor
    Performs a boot of the frame, node, or switch basecode application and supervisor.*

    exec_supervisor
    Causes the basecode to execute the nonactive frame, node, or switch supervisor thus making it active.*

    microcode
    Performs a download of supervisor microcode to the frame, node, or switch.*
    Note: You must issue the basecode command before issuing this command.

    rosdump
    Dumps the contents of the frame, node, or switch basecode or supervisor application, whichever is active. The contents are dumped to an aixterm that is opened for serial data read to the target slot.*

    Refer to the s1term command for information on making serial connections.

    Any Node

    normal
    Sets the keylock on a processing node to the Normal position.

    reset
    Presses and releases the reset button on a processing node.

    secure
    Sets the keylock on a processing node to the Secure position.

    service
    Sets the keylock on a processing node to the Service position.

    Any Frame

    runpost
    Initiates Power-On Self Tests (POST) in the frame supervisor.*

    setid
    Sets the frame ID into the frame supervisor.*

    Any Frame, Node, or Switch

    off
    Disables power to the frame power supplies, a processing node, or a switch.

    on
    Enables power to the frame power supplies, a processing node, or a switch.

    Any Node or Switch

    flash
    Flashes the I²C address of a processing node or a switch node in the node's yellow LED.

    One of these commands must be specified using the command operand. The command is sent to the hardware specified by the slot_spec operands. However, the command is not sent to any hardware that is not in the current system partition unless the -G flag is specified. If the -G flag is not specified and the slot_spec operands specify no hardware in the current system partition, an error message is displayed.

    The slot_spec operands are interpreted as slot ID specifications. A slot ID specification names one or more slots in one or more SP frames and it has either of two forms:

    fidlist:sidlist   or   nodlist
    

    where:

    fidlist
    = fval[,fval,...]

    sidlist
    = sval[,sval,...]

    nodlist
    = nval[,nval,...]

    The first form specifies frame numbers and slot numbers. The second form specifies node numbers. A fval is a frame number or a range of frame numbers of the form a-b. A sval is a slot number from the set 0 through 17 or a range of slot numbers of the form a-b. A nval is a node number or a range of node numbers of the form a-b.

    The relationship of node numbers to frame and slot numbers is shown in the following formula:

    node_number = ((frame_number - 1) × 16) + 
    slot_number
    
    Note: Node numbers can only be used to specify slots 1 through 16 of any frame.

    The following are some examples of slot ID specifications.

    To specify slot 1 in frames 1 through 10, enter:

    1-10:1
    

    To specify frames 2, 4, 5, 6, and 7, enter:

    2,4-7:0
    

    To specify slots 9 through 16 in frame 5, enter:

    5:9-16
    

    If frame 5 contained wide nodes, the even slot numbers are ignored.

    To specify specifies slots 1, 12, 13, 14, 15, and 16 in each of frames 3 and 4, enter:

    3,4:1,12-16
    

    To specify slot 17 in frame 4, enter:

    4:17
    

    To specify the nodes in slots 1 through 16 of frame 2, enter:

    17-32
    

    To specify the nodes in slot 1 of frame 1, slot 1 of frame 2 and slot 1 of frame 3, enter:

    1,17,33
    

    To specify the node in slot 6 of frame 1, enter:

    6
    

    Optionally, slot ID specifications can be provided in a file rather than as command operands. The file must contain one specification per line. The command requires that slot ID specifications be provided. If the command is to be sent to all SP hardware, the keyword all must be provided in lieu of the slot_spec operands. However, the all keyword can only be specified if the -G flag is specified and if the VFOP command is on or off, since on or off are the only commands common to all hardware components.

    Commands sent to hardware for which they are not appropriate, or sent to hardware which does not exist, are silently ignored by the Hardware Monitor subsystem.

    By default, and except for the reset, flash, and run_post commands, the hmcmds command does not terminate until the state of the hardware to which the command was sent matches the command or until 15 seconds have elapsed. If 15 seconds have elapsed, the hmcmds command terminates with a message stating the number of nodes whose state was expected to match the VFOP command sent and the number of nodes which actually are in that state. The state of hardware for which the VFOP command is inappropriate, or where the hardware does not exist, is ignored.

    To execute the hmcmds command, the user must be authorized to access the Hardware Monitor subsystem and, for those frames specified to the command, the user must be granted VFOP permission. Commands sent to frames for which the user does not have VFOP permission are ignored. Since the Hardware Monitor subsystem uses SP authentication services, the user must execute the kinit command prior to executing this command. Alternatively, site-specific procedures can be used to obtain the tokens that are otherwise obtained by kinit.

    Files

    /usr/lpp/ssp/bin/hmcmds
    Contains the hmcmds command.

    Related Information

    Command: hmmon, spsvrmgr

    Examples

    1. To turn power off in all hardware, enter:
      hmcmds -G off all
      

    2. In a five-frame SP system, to set the keyswitch on all processing nodes to Secure, enter:
      hmcmds secure 1-5:1-16
      

    3. To set the clock multiplexor in the switches in frames 1 through 8 to external clock 3, enter:
      hmcmds -G extclk3 1-8:17
      

    4. In a three-frame SP system, to set the keyswitch to Normal on node 6 and on the nodes in slot 2 of both frames 2 and 3, enter:
      hmcmds normal 6 2,3:2
      

    hmmon

    Purpose

    hmmon - Monitors the state of the SP hardware.

    Syntax

    hmmon
    [-G] [-q] [-Q] [-r | -s] [-v var_nlist]
     
    [-f file_name | slot_spec ... ]

    hmmon
    -V

    Flags

    -G
    Specifies Global mode. With this flag, all hardware can be specified.

    -q
    Displays the current state information prior to displaying changed state.

    -Q
    Displays only the current state information and exits.

    -r
    Displays the output in raw format.

    -s
    Displays the output in symbolic format.

    -v var_nlist
    Limits output to that of the state variables specified by var_nlist, a comma separated list of symbolic variable names. This list cannot contain blanks. Use the -V flag for a list of possible values.

    -V
    Displays a descriptive list of symbolic variable names and variable indexes, and exits.

    -f file_name
    Uses the file file_name as the source of slot ID specifications.

    Operands

    slot_spec
    Displays the addresses of hardware components.

    Description

    Use this command to monitor the state of the SP hardware contained in one or more SP frames. Each frame consists of 18 slots, numbered 0 through 17, where slot 0 represents the frame itself, slot 17 can contain a switch and slots 1 through 16 can contain thin or wide processing nodes. Wide nodes occupy two slots and are addressed by the odd slot number. In a switch only frame, slots 1 through 16 can contain switches; the switches occupy two slots and are addressed by the even slot number.

    With no flags and operands, the command prints to standard output descriptive text of all hardware state changes in the current system partition as they occur, from the time the command is invoked. The command does not terminate, unless the -Q flag or the -V flag is specified, and must be interrupted by the user. To monitor all of the hardware in the SP system, the -G flag must be specified. Note that the switches and the frames themselves are not contained in any system partition.

    When one or more slot_spec operands are present, each operand is interpreted as a slot ID specification. A slot ID specification names one or more slots in one or more SP frames and it has either of two forms:

    fidlist:[sidlist]   or   nodlist
    

    where:

    fidlist
    = fval[,fval,...]

    sidlist
    = sval[,sval,...]

    nodlist
    = nval[,nval,...]

    The first form specifies frame numbers and slot numbers. The second form specifies node numbers. A fval is a frame number or a range of frame numbers of the form a-b. A sval is a slot number from the set 0 through 17 or a range of slot numbers of the form a-b. An nval is a node number or a range of node numbers of the form a-b. If a sidlist is not specified, all hardware in the frames specified by the fidlist is monitored.

    The relationship of node numbers to frame and slot numbers is given by the following formula:

    node_number = ((frame_number - 1) × 16) + slot_number
    
    Note: The node numbers can only be used to specify slots 1 through 16 of any frame.

    The following are some examples of slot ID specifications.

    To specify all hardware in frames 1 through 10, enter:

    1-10:
    

    To specify frames 2, 4, 5, 6, and 7, enter:

    2,4-7:0
    

    To specify slots 9 through 16 in frame 5, enter:

    5:9-16
    

    If frame 5 contained wide nodes, the even slot numbers are ignored.

    To specify slots 1, 12, 13, 14, 15, and 16 in each of frames 3 and 4, enter:

    3,4:1,12-16
    

    To specify slot 17 in frame 4, enter:

    4:17
    

    To specify the nodes in slots 1 through 16 of frame 2, enter:

    17-32
    

    To specify the nodes in slot 1 of frame 1, slot 1 of frame 2 and slot 1 of frame 3, enter:

    1,17,33
    

    To specify the node in slot 6 of frame 1, enter:

    6
    

    Optionally, slot ID specifications may be provided in a file rather than as command operands. The file must contain one specification per line. When slot ID specifications are provided to the command, only the hardware named by the specifications is monitored. Furthermore, of the hardware named by these specifications, only that which is located in the current system partition is monitored. To monitor hardware not contained in the current system partition, the -G flag must be specified. If the -G flag is not specified and the slot ID specifications name no hardware in the current system partition, an error message is displayed.

    The default output displays hardware state information on a slot-by-slot basis. The state information for each slot is captioned by its frame ID and slot ID and consists of two columns. Each column contains state variable information, one variable per line. Each variable is displayed as descriptive text and a value. Boolean values are displayed as TRUE or FALSE. Integer values are displayed in hexadecimal.

    The command provides two other output formats, raw and symbolic. Both write the information for one state variable per line. The raw format consists of four fields separated by white space as follows:

    Field 1
    Contains the frame ID.

    Field 2
    Contains the slot ID.

    Field 3
    Contains the variable ID in hexadecimal.

    Field 4
    Contains the variable value, as received from the hardware, in decimal.

    The symbolic format consists of six fields separated by white space as follows:

    Field 1
    Contains the frame ID.

    Field 2
    Contains the slot ID.

    Field 3
    Contains the symbolic name of the state variable.

    Field 4
    Contains the variable value. Booleans are displayed as TRUE or FALSE. Integers are displayed as decimal values or floating point values, as appropriate to the definition of the variable.

    Field 5
    Contains the variable ID in hexadecimal.

    Field 6
    Contains the descriptive text for the variable. This is the same text that is displayed in the default output. Thus, "field" 6 contains embedded white space.

    The alternative output formats are suitable for input to post-processing programs, such as awk or scripts.

    Output in any format can be limited to display only information from the specified hardware that corresponds to a list of state variables supplied to the command with the -v flag.

    To execute the hmmon command, the user must be authorized to access the Hardware Monitor subsystem and, for those frames specified to the command, the user must be granted "Monitor" permission. State information is not returned for frames for which the user does not have "Monitor" permission. Since the Hardware Monitor subsystem uses SP authentication services, the user must execute the kinit command prior to executing this command. Alternatively, site specific procedures may be used to obtain the tokens that are otherwise obtained by kinit.

    The user can monitor nonexistent nodes in an existing frame in order to detect when a node is added while the system is up and running. No information is returned for nonexistent nodes when the -q or -Q flag is specified.

    Files

    /usr/lpp/ssp/bin/hmmon
    Contains the hmmon command.

    Related Information

    Command: hmcmds

    Examples

    The following is an example of default output from hmmon -G -Q 1:0,1. The command returns similar output, depending on your system configuration.

    frame 001, slot 00:
    node 01 I2C not responding  FALSE  node 02 I2C not responding   TRUE
    node 03 I2C not responding  FALSE  node 04 I2C not responding   TRUE
    switch I2C not responding   FALSE  node 01 serial link open     TRUE
    node 02 serial link open    FALSE  node 03 serial link open     TRUE
    frame LED 1 (green)        0x0001  frame LED 2 (green)        0x0001
    frame LED 3 (yellow)       0x0000  frame LED 4 (yellow)       0x0000
    AC-DC section A power off   FALSE  AC-DC section B power off   FALSE
    AC-DC section C power off   FALSE  AC-DC section D power off   FALSE
    supervisor timer ticks     0x88f2  +48 voltage                0x0078
    temperature                0x0036  supervisor serial number   0x1234
    supervisor type            0x0011  supervisor code version    0x5ff5
     
    frame 001, slot 01:
    serial 1 DTR asserted        TRUE   -12 volt low warning      TRUE
    -12 volt low shutdown       FALSE   -12 volt high warning     TRUE
    +4 volt low shutdown        FALSE   +4 volt high warning      TRUE
    fan 1 shutdown              FALSE   fan 2 warning             TRUE
    DC-DC power on > 10 secs  TRUE   +5 DC-DC output good      TRUE
    7 segment display flashing  FALSE   node/switch LED 1 (green) 0x0001
    reset button depressed      FALSE   serial link open          TRUE
    diagnosis return code      0x00dd   7 segment LED A           0x00ff
    +5 I/O voltage             0x007f   +12 voltage               0x0096
    

    The following is an example of raw output from hmmon -G -Q -r 1:0. The command returns similar output, depending on your system configuration.

    1 0 0x880f 32
    1 0 0x881c 0
    1 0 0x881d 4
    1 0 0x8834 54
    1 0 0x8839 4660
    1 0 0x883a 17
    1 0 0x88a8 1
    1 1 0x9097 16
    1 1 0x9098 0
    1 1 0x9047 1
    1 1 0x909d 128
    1 1 0x9023 221
    1 1 0x90a1 255
    1 1 0x90a2 127
    1 1 0x903b 24565
    

    The following is an example of symbolic output from hmmon -G -Q -s 1:0. The command returns similar output, depending on your system configuration.

    1  0  nodefail1          FALSE    0x8802  node 01 I2C not responding
    1  0  nodeLinkOpen1      TRUE     0x8813  node 01 serial link open
    1  0  frACLED                  1  0x8824  frame LED 1 (green)
    1  0  frNodeComm               0  0x8827  frame LED 4 (yellow)
    1  0  frPowerOff_B       FALSE    0x882d  AC-DC section B power off
    1  0  timeTicks            34881  0x8830  supervisor timer ticks
    1  0  voltP48             46.800  0x8831  +48 voltage
    1  0  type                    17  0x883a  supervisor type
    1  0  codeVersion          24565  0x883b  supervisor code version
    1  0  controllerResponds TRUE     0x88a8  Frame responding to polls
    1  0  rs232DCD           TRUE     0x88a9  RS232 link DCD active
    1  0  rs232CTS           TRUE     0x88aa  RS232 link CTS active
    1  1  fanfail2           FALSE    0x9050  fan 2 shutdown
    1  1  nodePowerOn10Sec   TRUE     0x904b  DC-DC power on > 10 secs
    1  1  P5DCok             TRUE     0x9097  +5 DC-DC output good
    1  1  powerLED                 1  0x9047  node/switch LED 1 (green)
    1  1  envLED                   0  0x9048  node/switch LED 2 (yellow)
    1  1  keyModeSwitch            0  0x909b  key switch
    1  1  serialLinkOpen     TRUE     0x909d  serial link open
    1  1  LED7SegA               255  0x909f  7 segment LED A
    1  1  voltP5i              4.978  0x90a2  +5 I/O voltage
    

    The raw and symbolic formats output by the hmmon command contain the variable ID of each state variable. Refer to Appendix D in IBM Parallel System Support Programs for AIX: Administration Guide.

    hmreinit

    Purpose

    hmreinit - Stops and starts the Hardware Monitor daemon and modifies the System Data Repository (SDR) as necessary.

    Syntax

    hmreinit

    Flags

    None.

    Operands

    None.

    Description

    Use this command to reinitialize the Hardware Monitor daemon when changes to the SP system occur. Specifically, hmreinit determines if there are any changes in the switch configuration (such as, adding, deleting, or upgrading switches). The hmreinit command then calls SDR_config -u to update the switch information in the SDR and generates switch node numbers based on this change. If a switch configuration change is detected, hmreinit will test to see if a single system partition exists. If only one system partition exists, hmreinit will delete the Syspar_map entries from the SDR and then calls create_par_map to generate the correct objects. If more than one system partition exists, hmreinit will issue a message to that effect and exits.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that more than one system partition was found.

    Security

    You must have root privilege to run this command and have a valid ticket.

    Implementation Specifics

    This command is part of the Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Location

    /usr/lpp/ssp/install/bin/hmreinit

    Related Information

    Commands: spframe, SDR_config

    For additional information, refer to the "Reconfiguring the IBM RS/6000 SP System" chapter in IBM Parallel System Support Programs for AIX: Installation and Migration Guide.

    hostlist

    Purpose

    hostlist - Lists SP host names to standard output based on criteria.

    Syntax

    hostlist
    [-s framerange:slotrange] [-f file_name] [-a] [-G]
     
    [-n noderange] [-w host_name,host_name, ...]
     
    [-e host_name,host_name, ...] [-v] [-d | -l] [ -r]
     
    [-N node_group,node_group, ...]

    Flags

    -s
    Specifies a range of frames and a range of slots on each of the frames. Ranges are specified as in 1-3, meaning 1 through 3 inclusive, and as 1,3,15, meaning 1, 3, and 15. Ranges can incorporate both styles as in 1-10,15. So, 1-3,5:1-2,4 would refer to slots 1,2 and 4 on each of the frames 1,2,3, and 5. If a node occupies more than one slot, referring to either or both of the slots refers to the node.

    -f
    Specifies the file name of a working collective file as in the dsh working collective, containing a host name on each line. This can be in the format of a Parallel Operating Environment (POE) host.list file.

    -a
    Specifies that the System Data Repository (SDR) initial_hostname field for all nodes in the current system partition be written to standard output. For each node, this corresponds to what the hostname command returns on the node.

    -G
    Changes the scope of the arguments associated with the -a, -n, -s, and -N options from the current system partition to the SP system.

    -n
    Specifies that all nodes in a noderange are written. The range specification has syntax similar to that of frame or slot ranges. Nodes are numbered starting with 1, for frame 1 slot 1, up to the number of slots on the system (note that a node number can refer to an empty slot). A noderange can span frames (for example, 1-4,17-50) would refer to all nodes occupying slots 1-4 on frame 1 and 1-16 on frames 2 and 3, and slots 1 and 2 on frame 4.

    -w
    Specifies a list of host names, separated by commas, to include in the working collective. Both this flag and the a flag can be included on the same command line. Duplicate host names are only included once in the working collective.

    -e
    Specifies an exclusion list. Comma-separated host names specified are not written to standard output.

    -v
    Specifies that only nodes that are responding according to the SDR have their host names written.

    -d
    Specifies that IP addresses are returned as output.

    -l
    Specifies that long host names be written. (This is lowercase l, as in list.)

    -r
    Specifies a restriction to write host names for only those nodes that have exactly the same node number or starting slot specified by the search argument. For example, if a "-n" value corresponds to the second slot of a wide node, and the "-r" flag is used, then a warning message is written instead of the host name for the first slot of the wide node.

    -N
    Specifies a list of node groups. Each node group is resolved into nodes. The host names of these nodes are added to the host list. If -G is supplied, a global node group is used. Otherwise, a partitioned-bound node group is used.

    Operands

    None.

    Description

    The hostlist command writes SP host names to standard output. The arguments to the command indicate the host names to be written. More than one flag can be specified, in which case, the hosts indicated by all the flags are written.

    If no arguments are specified, hostlist writes the contents of a file specified by the WCOLL environment variable. If the WCOLL environment variable does not exist, the MP_HOSTFILE environment variable is used as the name of a POE host file to use for input. Finally, ./host.list is tried. If none of these steps are successful, an error has occurred. The input file is in dsh-working-collective-file or POE-host-list-file format. Node pool specifications in POE host files are not supported.

    Files

    working collective file
    See the dsh command.

    POE host.list file
    See Parallel Environment for AIX: Operation and Use documentation.

    Related Information

    Commands: dsh, sysctl

    Examples

    1. To create a working collective file of all nodes in the system partition that are responding, except for badhost, enter:
      hostlist -av -e badhost > ./working
      

    2. To run a program on the nodes on slot 1 of each of 4 frames, enter:
      hostlist -s 1-4:1 | dsh -w - program
      

    3. To run a program on the nodes on all slots for frame 1 and slots 1-3 for frame 3, as well as on host otherone, enter:
      hostlist -n 1-16,33-35 -w otherone | dsh -w - program
      

    4. To run a Sysctl application on all the nodes in the WCOLL file ./wcoll:, enter:
      export WCOLL=./wcoll
      hostlist | sysctl -c - sysctl_app args
      

    hr Script

    Purpose

    hr - Controls the host_responds monitor daemon, hrd, on the control workstation.

    Syntax

    hr [-spname syspar_name]

    { [start | resume] | [stop | quiesce] | reset |

    [query | qall | qsrc] | refresh | mksrc optional_flags | rmsrc | clean | restore |

    [debug | debug off ] | [trace on | trace off ] }

    Flags

    -spname syspar_name
    Executes the command for the system partition specified by the syspar_name operand. If this flag is not specified, the name of the system partition given by the value of the SP_NAME variable is used.

    Operands

    start | resume
    Starts the hrd daemon.

    stop | quiesce
    Stops the hrd daemon.

    reset
    Stops and restarts the hrd daemon.

    query
    Queries the daemon for status. The response to the query includes hrd-specific information.

    qall
    Performs the query function for each defined partition.

    qsrc
    Displays a subsystem definition for a partition.

    refresh
    Uses the refresh command to request a daemon refresh.

    mksrc optional_flags
    Uses the mkssys command to create an SRC subsystem object. Additional flags for the command may be specified.

    rmsrc
    Uses the rmssys command to remove an SRC subsystem object.

    clean
    Removes all entries for the subsystem for all system partitions.

    restore
    Synchronizes the running daemons with the information in the System Data Repository (SDR). This operand removes all entries for the subsystem, creates new entries based on information in the SDR, and starts the subsystems.

    [debug | debug off ]
    Turns debugging on or off.

    [trace on | trace off ]
    Turns additional tracing on or off.

    Description

    Use this command to control the operation of hrd, the host_responds daemon on the control workstation within a system partition. The heartbeat server provides input to the host_responds function within a system partition for the System Monitor through the hrd daemons.

    The hr script is not normally executed from the command line. It is normally called by the hrctrl command, which is in turn called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

    The hrd daemon is initially started on the control workstation with the System Resource Controller (SRC). It is respawned automatically if the hrd daemon fails. The SP_NAME environment variable causes selection of the correct daemon.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    The "Starting Up and Shutting Down the SP System" chapter and "The System Data Repository" appendix in IBM Parallel System Support Programs for AIX: Administration Guide

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/hr

    Related Information

    Commands: hb, hrctrl, lssrc, startsrc, stopsrc, syspar_ctrl

    Examples

    See the hrctrl command.

    hrctrl Script

    Purpose

    hrctrl - A script that controls the Host_Responds subsystem.

    Syntax

    hrctrl { -a | -s | -k | -d | -c | -t | -o | -r | -h }

    Flags

    -a
    Adds the subsystem.

    -s
    Starts the subsystem.

    -k
    Stops the subsystem.

    -d
    Deletes the subsystem.

    -c
    Cleans the subsystems, that is, delete them from all system partitions.

    -t
    Turns tracing on for the subsystem.

    -o
    Turns tracing off for the subsystem.

    -r
    Refreshes the subsystem.

    -h
    Displays usage information.

    Operands

    None.

    Description

    The Host_Responds subsystem provides to other PSSP subsystems information about the state of the nodes on the IBM RS/6000 SP.

    The hrctrl control script controls the operation of the Host_Responds subsystem. The subsystem is under the control of the System Resource Controller (SRC) and belongs to a subsystem group called hr. Associated with each subsystem is a daemon and a script that configures and starts the daemon.

    An instance of the Host_Responds subsystem executes on the control workstation for every system partition. Because Host_Responds provides its services within the scope of a system partition, its subsystem is said to be system partition-sensitive. This control script operates in a manner similar to the control scripts of other system partition-sensitive subsystems. The script should be issued on the control workstation. If it is issued on a node, it has no effect.

    From an operational point of view, the Host_Responds subsystem group is organized as follows:

    Subsystem
    Host_Responds

    Subsystem Group
    hr

    SRC Subsystem
    hr

    The hr subsystem is associated with the hrd daemon and the hr script. The hr script configures and starts the hrd daemon.

    On the control workstation, there are multiple instances of each subsystem, one for each system partition. Accordingly, the subsystem names on the control workstation have the system partition name appended to them. For example, for system partitions named sp_prod and sp_test, the subsystems on the control workstation are named hr.sp_prod and hr.sp_test.

    The subsystem does not run on the nodes.

    Daemons
    hrd

    The hrd daemon provides the Host_Responds services. The hr script configures and starts the hrd daemon.

    The hrctrl script is not normally executed from the command line. It is normally called by the syspar_ctrl command during installation of the system, and partitioning or repartitioning of the system.

    The hrctrl script provides a variety of controls for operating the Host_Responds subsystem:

    Before performing any of these functions, the script obtains the node number (using the node_number) command. If the node number is not zero, the control script is running on a node and it exits immediately. Otherwise, it is executing on the control workstation and it calls the hr script with an operand that specifies the action to be performed.

    Adding the Subsystem

    When the -a flag is specified, the control script uses the hr command with the mksrc operand to add the Host_Responds subsystem to the SRC.

    Starting the Subsystem

    When the -s flag is specified, the control script uses the hr command with the start operand to start the Host_Responds subsystem, hr.

    Stopping the Subsystem

    When the -k flag is specified, the control script uses the hr command with the stop operand to stop the Host_Responds subsystem, hr.

    Deleting the Subsystem

    When the -d flag is specified, the control script uses the hr command with the rmsrc operand to remove the Host_Responds subsystem from the SRC.

    Cleaning up the Subsystems

    When the -c flag is specified, the control script uses the hr command with the clean operand to stop and remove the Host_Responds subsystems for all system partitions from the SRC.

    Turning Tracing On

    When the -t flag is specified, the control script turns tracing on for the hrd daemon, using the hr command with the trace on operand.

    Turning Tracing Off

    When the -o flag is specified, the control script turns tracing off (returns it to its default level) for the hrd daemon, using the hr command with the trace off operand.

    Refreshing the Subsystem

    When the -r flag is specified, the control script refreshes the subsystem, using the hr refresh command.

    Standard Error

    This command writes error messages (as necessary) to standard error.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred.

    Security

    You must be running with an effective user ID of root.

    Implementation Specifics

    This command is part of the IBM Parallel System Support Programs (PSSP) Licensed Program Product (LPP).

    Prerequisite Information

    AIX Version 4 Commands Reference

    Information about the System Resource Controller (SRC) in AIX Version 4 General Programming Concepts: Writing and Debugging Programs

    Location

    /usr/lpp/ssp/bin/hrctrl

    Related Information

    Commands: hr, lssrc, startsrc, stopsrc, syspar_ctrl

    Examples

    1. To add the Host_Responds subsystem to the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hrctrl -a
      

    2. To start the Host_Responds subsystem in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hrctrl -s
      

    3. To stop the Host_Responds subsystem in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hrctrl -k
      

    4. To delete the Host_Responds subsystem from the SRC in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hrctrl -d
      

    5. To clean up the Host_Responds subsystem on all system partitions, enter:
      hrctrl -c
      

    6. To turn tracing on for the Host_Responds daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hrctrl -t
      

    7. To turn tracing off for the Host_Responds daemon in the current system partition, set the SP_NAME environment variable to the appropriate system partition name and enter:
      hrctrl -o
      

    8. To display the status of all of the subsystems in the Host_Responds SRC group, enter:
      lssrc -g hr
      

    9. To display the status of an individual Host_Responds subsystem, enter:
      lssrc -s subsystem_name
      

    10. To display detailed status about an individual Host_Responds subsystem, enter:
      lssrc -l -s subsystem_name
      

      In response, the system returns information that includes the running status of the subsystem and the status of the nodes within the system partition.

    11. To display the status of all of the daemons under SRC control, enter:
      lssrc -a
      

    hsdatalst

    Purpose

    hsdatalst - Displays data striping device (HSD) data for the IBM Virtual Shared Disks from the System Data Repository (SDR).

    Syntax

    hsdatalst [-G]

    Flags

    -G
    Displays information for all system partitions on the SP, not only the current system partition.

    Operands

    None.

    Description

    This command is used to display defined HSD information in the system.

    You can use the System Management Interface Tool (SMIT) to run this command. To use SMIT, enter:

    smit list_vsd
    
    and select the List Defined Hashed Shared Disk option.

    Files

    /usr/lpp/csd/bin/hsdatalst
    Specifies the command file.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: defhsd, undefhsd

    Examples

    To display SDR HSD data, enter:

    hsdatalst
    

    hsdvts

    Purpose

    hsdvts - Verifies that a data striping device (HSD) for the IBM Virtual Shared Disks has been correctly configured and works.

    Syntax

    hsdvts hsd_name

    Flags

    None.

    Operands

    hsd_name
    The name of the HSD you want verified. Warning: Data on vsd_name will be overwritten and, therefore, destroyed.

    Description

    Attention

    Data on hsd_name will be overwritten and, therefore, destroyed. Use this command after you have defined your HSD, IBM Virtual Shared Disks, and logical volumes, but before you have loaded your application data onto any of them.

    This command writes /unix to hsd_name, reads it from hsd_name to a temporary file, and compares the temporary file to the original to make sure the I/O was successful. If the files compare exactly, the test was successful.

    hsdvts writes to the raw hsd_name device /dev/rhsd_name. Since raw devices can only be written in multiples of 512-sized blocks, hsdvts determines the number of full 512-byte blocks in /unix file, and writes that number to hsd_name via dd command. It makes a copy of /unix that contains this number of 512-byte blocks for comparison to the copy read from hsd_name. The dd command is used for all copy operations.

    Files

    /usr/lpp/csd/bin/hsdvts
    Specifies the command file.

    Prerequisite Information

    IBM Parallel System Support Programs for AIX: Managing Shared Disks

    Related Information

    Commands: cfghsd, cfgvsd, dd, defhsd, startvsd

    ifconfig

    Purpose

    /usr/lpp/ssp/css/ifconfig - Configures or displays network interface parameters for a network using TCP/IP.

    Syntax

    ifconfig
    interface [address_family [address
     
    [destination_address]] [parameter...]]

    Flags

    None.

    Operands

    address
    Specifies the network address for the network interface. For the inet family, the address operand is either a host name, or an Internet address in the standard dotted decimal notation.

    address_family
    Specifies which network address family to change. The inet and ns address families are currently supported. This operand defaults to the inet address family.

    destination_address
    Specifies the address of the correspondent on the remote end of a point-to-point link.

    interface
    Specifies the network interface configuration values to show or change. You must specify an interface with the interface operand when you use the ifconfig command. Abbreviations for the interfaces include:
    en
    Standard Ethernet (inet, xns)
    et
    IEEE 802.3 Ethernet (inet, xns)
    tr
    Token ring (inet, xns)
    xt
    X.25 (inet)
    sl
    Serial line IP (inet)
    lo
    Loopback (inet)
    op
    Serial (inet)
    css
    Scalable POWERparallel Switch (SP Switch) or High Performance Switch

    Include a numeral after the abbreviation to identify the specific interface (for example, tr0).

    parameter
    Allows the following parameter values:

    alias
    Establishes an additional network address for the interface. When changing network numbers, this is useful for accepting packets addressed to the old interface.

    allcast
    Sets the token-ring interface to broadcast to all rings on the network.

    -allcast
    Confines the token-ring interface to broadcast only to the local ring.

    arp
    Enables the ifconfig command to use the Address Resolution Protocol (ARP) in mapping between network-level addresses and link-level addresses. This flag is in effect by default.

    -arp
    Disables the use of the Address Resolution Protocol.

    authority
    Reserved.

    bridge
    Reserved.

    -bridge
    Reserved.

    broadcast_address
    (inet only). Specifies the address to use to broadcast to the network. The default broadcast address has a host part of all 1's (ones).

    debug
    Enables driver-dependent debug code.

    -debug
    Disables driver-dependent debug code.

    delete
    Removes the specified network address. This is used when an alias is incorrectly specified or when it is no longer needed. Incorrectly setting ns addresses have the side effect of specifying the host portion of the network address. Removing all ns addresses allows you to respecify the host portion.

    detach
    Removes an interface from the network interface list. If the last interface is detached, the network interface driver code is unloaded.

    down
    Marks an interface as inactive (down), which keeps the system from trying to transmit messages through that interface. If possible, the ifconfig command also resets the interface to disable reception of messages. Routes that use the interface, however, are not automatically disabled.

    hwloop
    Enables hardware loopback. The hardware loopback specifies that locally-addressed packets handled by an interface should be sent out using the associated adapter.

    -hwloop
    Disables hardware loopback. The hardware loopback specifies that locally-addressed packets handled by an interface should be sent out using the associated adapter.

    ipdst
    Specifies an Internet host willing to receive IP packets encapsulating ns packets bound for a remote network. An apparent point-to-point link is constructed, and the specified address is taken as the ns address and network of the destination.

    metric_number
    Sets the routing metric of the interface to the value specified by the number variable. The default is 0. The routing metric is used by the routing protocol (the routed daemon). Higher metrics have the effect of making a route less favorable. Metrics are counted as addition hops to the destination network or host.

    mtu_value
    Sets the maximum IP packet size for this system. The value variable can be any number from 60 through 65520, depending on the network interface. See "Understanding Automatic Configuration of Network Interfaces" in AIX Version 4 System Management Guide: Communications and Networks for maximum transmission unit (MTU) values by interface.

    netmask_mask
    Specifies how much of the address to reserve for subdividing networks into subnetworks. This parameter can only be used with an address family of inet.

    The mask variable includes both the network part of the local address and the subnet part, which is taken from the host field of the address. The mask can be specified as a single hexadecimal number beginning with 0x, in standard Internet dotted decimal notation, or beginning with a name or alias that is listed in the /etc/networks file.

    The mask contains 1's (ones) for the bit positions in the 32-bit address that are reserved for the network and subnet parts, and 0's (zeros) for the bit positions that specify the host. The mask should contain at least the standard network portion, and the subnet segment should be contiguous with the network segment.

    offset
    Used by the CSS/IP for static IP address translation only.
    Note: If the ARP is enable, offset is not used.

    TB0/TB2
    Indicates to the CSS/IP whether it is running over TB0 or TB2 adapter interface. The default is TB2 adapter.

    security
    Reserved.

    snap
    Reserved.

    -snap
    Reserved.

    up
    Marks an interface as active (up). This parameter is used automatically when setting the first address for an interface. It can also be used to enable an interface after an ifconfig down command.

    Description

    The ifconfig command has been modified to add support for the switch. This command is valid only on an SP system.

    The ifconfig command can be used from the command line either to assign an address to a network interface, or to configure or display the current network interface configuration information. The ifconfig command must be used at system start up to define the network address of each interface present on a machine. It can also be used at a later time to redefine an interface's address or other operating parameters. The network interface configuration is held on the running system and must be reset at each system restart.

    An interface can receive transmissions in differing protocols, each of which may require separate naming schemes. It is necessary to specify the address_family parameter, which can change the interpretation of the remaining parameters. The address families currently supported are inet and ns.

    For the DARPA Internet family, inet, the address is either a host name present in the host name database, that is, the /etc/hosts file, or a DARPA Internet address expressed in the Internet standard dotted decimal notation.

    For the Xerox Network Systems (XNS) family, ns, addresses are net:a.b.c.d.e.f., where net is the assigned network number (in decimal), and each of the six bytes of the host number, a through f, are specified in hexadecimal. The host number can be omitted on 10-Mbps Ethernet interfaces, which use the hardware physical address, and on interfaces other than the first interface.

    While any user can query the status of a network interface, only a user who has administrative authority can modify the configuration of those interfaces.

    Related Information

    AIX Command: netstat

    AIX Files: /etc/host, /etc/networks

    Refer to IBM Parallel System Support Programs for AIX: Administration Guide for additional information on the SP Switch and the High Performance Switch.

    Refer to AIX Version 4 System Management Guide: Communications and Networks for additional information on TCP/IP protocols.

    Refer to AIX Version 4 General Programming Concepts: Writing and Debugging Programs for an overview on Xerox Network Systems (XNS).

    Examples

    The following are examples using the ifconfig command on a TCP/IP network and an XNS network, respectively:

    Inet Examples

    1. To query the status of a serial line IP interface, enter:
      ifconfig sl1
      

      In this example, the interface to be queried is sl1. The result of the command looks similar to the following:

      sl1: flags=51<UP,POINTOPOINT,RUNNING>
           inet 192.9.201.3 --> 192.9.354.7 netmask ffffff00
      

    2. To configure the local loopback interface, enter:
      ifconfig lo0 inet 127.0.0.1 up
      

    3. To mark the local token-ring interface as down, enter:
      ifconfig tr0 inet down
      

      In this example, the interface to be marked is token0.
      Note: Only a user with root user authority can modify the configuration of a network interface.

    4. To specify an alias, enter:
      ifconfig css0 inet 127.0.0.1 netmask 255.255.255.0 alias
      

    XNS Examples

    1. To configure a standard Ethernet-type interface for XNS, enter:
      ifconfig en0 ns 110:02.60.8c.2c.a4.98 up
      

      In this example, ns is the XNS address family, 110 is the network number and 02.60.8c.2c.a4.98 is the host number, which is the Ethernet address unique to each individual interface. Specify the host number when there are multiple Ethernet hardware interfaces, as the default may not correspond to the proper interface. The Ethernet address can be obtained by the commands:

      ifconfig en0
      netstat -v
      

      The XNS address can be represented by several means, as can be seen in the following examples:

      123#9.89.3c.90.45.56
       
      5-124#123-456-900-455-749
       
      0x45:0x9893c9045569:90
       
      0456:9893c9045569H
      

      The first example is in decimal format, and the second example, using minus signs, is separated into groups of three digits each. The 0x and H examples are in hexadecimal format. Finally, the 0 in front of the last example indicates that the number is in octal format.

    2. To configure an IEEE Ethernet 802.3-type interface for XNS, enter:
      ifconfig et0 ns 120:02.60.8c.2c.a4.98 up
      

      The en0 and et0 interfaces are considered as separate interfaces even though the same Ethernet adapter is used. Two separate networks can be defined and used at the same time as long as they have separate network numbers. Multiple Ethernet adapters are supported.
      Note: The host number should correspond to the Ethernet address on the hardware adapter. A system can have multiple host numbers.

    3. To configure an Internet encapsulation XNS interface, enter:
      ifconfig en0 inet 11.0.0.1 up
      ifconfig en0 ns 110:02.60.8c.2c.a4.98 up
      ifconfig en0 ns 130:02.60.8c.34.56.78 ipdst 11.0.0.10
      

      The first command brings up the Internet with the inet address 11.0.0.1. The second command configures the en0 interface to be network 110 and host number 02.60.8c.2c.a4.98 in the ns address family. This defines the host number for use when the XNS packet is encapsulated within the Internet packet. The last command defines network 130, host number 02.60.8c.34.56.78, and destination Internet address 11.0.0.10. This last entry creates a new network interface, nsip. Use the netstat -i command for information about this interface.

    install_cw

    Purpose

    install_cw - Completes the installation of system support programs in the control workstation.

    Syntax

    install_cw

    Flags

    None.

    Operands

    None.

    Description

    Use this command at installation to perform the following tasks:

    install_hacws

    Purpose

    install_hacws - Creates and configures a High Availability Control Workstation (HACWS) configuration from a regular control workstation configuration.

    Syntax

    install_hacws -p host_name -b host_name [-s]

    Flags

    -p
    Specifies the host name of the primary control workstation. The host name is the name that is set in the kernel and identifies the physical machine. It is also required that this name have a route defined to a network adapter on the primary control workstation. This option is required.

    -b
    Specifies the host name of the backup control workstation. The host name is the name that is set in the kernel and identifies the physical machine. It is also required that this name have a route defined to a network adapter on the backup control workstation. This option is required.

    -s
    Invokes the command on both the primary and the backup control workstations.

    Operands

    None.

    Description

    Use this command to perform configuration and installation tasks on HACWS. This command is used instead of install_cw once the configuration has been made an HACWS configuration. This command is valid only when issued on the control workstation. When the command is executed and the calling process is not on a control workstation, an error occurs.
    Note: The install_hacws command permanently alters a control workstation to an HACWS. The only way to go back to a single control workstation is to have a mksysb image of the primary control workstation before the install_hacws command is executed.

    Both the primary and backup control workstations must be running and capable of executing remote commands via the /usr/lpp/ssp/rcmd/bin/rsh command.

    Exit Values

    0
    Indicates the successful completion of the command.

    1
    Indicates that an error occurred. Diagnostic information is written to standard output and standard error.

    Standard output consists of messages indicating the progress of the command as it configures the control workstations.

    Prerequisite Information

    Refer to IBM Parallel System Support Programs for AIX: Administration Guide for information on the HACWS option.

    Location

    /usr/sbin/hacws/install_hacws

    Related Information

    SP Commands: install_cw, rsh, setup_logd

    Examples

    1. To configure both control workstations on an SP system, enter the following:
      install_hacws -p primary_cw -b backup_cw -s
      

    2. To configure the control workstations separately, enter the following.

      On the primary control workstation, enter:

      install_hacws -p primary_cw -b backup_cw
      

      After the preceding command completes on the primary control workstation, enter the following on the backup control workstation:

      install_hacws -p primary_cw -b backup_cw
      

    jm_config

    Purpose

    jm_config - Reconfigures the Resource Manager.

    Syntax

    jm_config

    Flags

    None.

    Operands

    None.

    Description

    Use this command to reconfigure the Resource Manager (RM) servers.

    This command must be executed by root on the control workstation. It reads the Resource Manager configuration data from the /etc/jmd_config.syspar_name file, where syspar_name represents the current system partition environment. This current working environment can be determined by issuing spget_syspar -n. The jm_config command then contacts the correct primary Resource Manager and sends a message telling the server to update its configuration data from the System Data Repository (SDR). The new configuration takes effect with the next client request.
    Note: 604 High Nodes cannot be configured as part of a parallel pool. Therefore, the Resource Manager will not allocate these nodes for parallel jobs.

    The Resource Manager can also be reconfigured via the System Management Interface Tool (SMIT). To use SMIT, enter:

    smit RM_options
    
    and select the Reconfigure the Resource Manager option. Refer to IBM Parallel System Support Programs for AIX: Administration Guide for additional information on configuring the Resource Manager and system partitioning.

    Files

    /etc/jmd_config.syspar_name
    Resource Manager configuration data file.

    /usr/lpp/ssp/bin/jm_config
    Path name of this command.

    /var/adm/SPlogs/jm/jmd_out
    Resource Manager information log.

    /var/adm/SPlogs/jm/jmd_err
    Resource Manager error log.

    Related Information

    Commands: jm_start, jm_status, jm_stop, locate_jm, spget_syspar

    Examples

    To reconfigure the R