View on GitHub

Open Source Cluster Application Resources

Download this project as a .zip file Download this project as a tar.gz file

Documentations > User Documentations

Openfabrics (Infiniband) Setup

These are some notes on seting up the OpenFabrics 1.2.5 infiniband driver/utility/mpi package set from source on an OSCAR 5.0 cluster. I successfully installed OFED on RHEL 4, but your milage may vary.

  • Install OSCAR 5.0 up through step 3 “Install Server Packages” as described in the installation guide ([http://svn.oscar.openclustergroup.org/trac/oscar/wiki/InstallGuideClusterInstall])
  • Get OFED source ([http://www.openfabrics.org/builds/ofed-1.2.5/release/])
  • yume install tcl-devel

  • Make a new rpm list in /opt/oscar/oscarsamples/your.rpmlist (copy the one for the distro you are using)and add these rpms to the end of the list (in addition to the LDAP ones if you want to use those)

    #added for OFED
    tcl-devel
    kernel-devel
    kernel-smp-devel
    rpm-build
    sysfsutils
    sysfsutils-devel
    pciutils-devel
    
  • Reopen the OSCAR wizard and make a new image called myimage-OFED-source using /opt/oscar/oscarsamples/your.rpmlist and /opt/oscar/oscarsamples/scsi.disk
  • Copy the OFED source to the image directory (cp OFED-1.2.5.tgz /var/lib/systemimager/images/myimage-OFED-source/tmp)
  • chroot /var/lib/systemimager/images/myimage-OFED-source
  • untar source
  • go into source directory and run ./install.sh
  • Select option 3) Install everything
  • I skipped IBoIP or whatever, since I don’t care about making another TCP/IP stack to confuse things
  • Set head node 10.0.0.1 as gateway, I think
  • Had a spinner lock up in chroot and had to escape (ctl+c) out of that part of the install, doesn’t seem to have broken anything important (to me)
  • go back and redefine all the nodes (delete and make them again)
  • reimage
  • Test

       * cexec "ibv_devinfo | grep ACTIVE"
              o state: PORT_ACTIVE (4)
    
  • Repeat source installation on head node if desired

  • SGE queue problem with open mpi and infiniband (RHEL4): http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=17060
  • Make a new switcher module for open mpi