It is setup to suit the needs of
a research group doing computational studies of biomolecules in solution,
mainly molecular dynamics.
Configuration (September 1999)
Hardware
44x 450MHz PentiumII (512 KB cache) processors on 22 Dual processor
MS6120 100 MHz bus motherboards, each with:
One FastEthernet switch, D-LINK w/ 24 ports
One keyboard, mouse, monitor, floppy disk drive and graphics card for the whole system.
Most of this (18 boards) is mounted in a standard 19" rack, on simple aluminum shelves with holes to attach the motherboard, disk, powersupply and fan.
Software
OS: Linux RedHat6.0, with kernel version 2.2.9 SMP, using NFS, NIS and autofs to integrate the system in the user and file spaces at CSB.
Scheduling: NQS 3.50.5 (VERY IMPORTANT TOOL!) There are 2 patches which have to be applied.
Compiler: Absoft ProFortran 6.0
Parallel library: MPIch
Main application: CHARMM
Current NQS configuration
6 queues, 4 with access to 8 CPUs and 2 with 4 CPUs, with a time-limit of 24 hours (48 for the 4-CPU queues) on the CPU running the NQS scheduler. (In addition there are 8 DEC Alpha machines for single-CPU jobs, also under NQS control). We run CHARMM using its own socket based parallel communication library. The two remaining Linux-boxes are used for compilations, testing and as on-line spare machines (we also have a cardboard box with spare parts, from fans to memory and CPUs, right next to the rack).
Installation & configuration procedures
First node:
Install Linux in a minimal configuration.
Make sure the amount of memory is correctly setup in /etc/lilo.conf
Configure NIS and autofs.
Install NQS.
Install MPIch.
For the following nodes we copied the newly created disk to a new disk,
made a handful of changes to
transform this new disk into a bootable disk for a system with a new
identity (name&IP number). This procedure is run by a script and so
is very easy to repeat for all the system disks of your machine, both at
the initial setup
and later on in case of disk problems..
System administration and reliability
The system has been very stable, and does not incur any extra administration.
Performance
The standard CHARMM benchmark, 1000 steps of MD on myoglobin in water (14000 atoms) runs in about 10 minutes on 8 nodes (approximately same as on 16 nodes of a T3E)
As long as the users can fill the queue with jobs we get 100% utilization of the machine. The parallel overhead in each job of course reduces the actual yield, but on 8 CPUs our typical MD simulation jobs give around 65-70% efficieny (meaning that they run ca 5 times faster than on a single CPU)