Table of Contents * Previous Chapter * Next Chapter
Users of UGAMP's HPC location currently include:
- directly funded UGAMPers at CGAM- ACMSU + UGAMP sites
- UGAMP PI's + PDRAs + students
- UGAMP related NERC contracts, Special Topic
I + II, Ocean-Atmosphere Coupling
- Commissioned research contracts, EEC: Framework III, (IV) + HCM DOE + Met. Office
- others: visitors, collaborators
which altogether amounts to about 130 registered users of UGAMP HPC accounts.
UGAMP currently has an allocation on the following National High Performance Computers:
RAL YMP8
Status: over subscribed I/O limited, decommissioning date not yet known
Allocation Used4.93-3.94 10000 hours 10000 hours 4.94-3.95 12500 hours 12500 hours 4.95-3.96 15000 hours 12000 hours (7.95) (OVER USED WITH 8 MONTHS LEFT!) (1.96-3.96 1000 hours)*
extra allocation 4.95-3.97 16000 hours
ULCC Convex
Status: spare capacity, limited user support in 1995/1996, decommissioning date March 1996
Allocation Used 4.93-3.96 3000 hours 3000 hours (6.95) 6.95-3.96 1000 hours 700 hours (8.95) extra allocation (9.95-3.96 1000 hours)* extra allocation
EPCC T3D
Status: over subscribed, time sharing giving a difficult queuing strategy, decommissioning date not yet known
Allocation Used 7.94-10.94 6000 node hrs. 3 node hrs. 11.94-3.95 18000 node hrs. 170 node hrs. 4.95-10.95 8000 node hrs. 9302 node hrs(8.95) (11.95- 3.96)*[The allocation strategy on T3D does not allow any transfer of unused allocation from one period to the next and now allocation in each 6 month period is divided into 2 month chunks and again unused allocation in each chunk is not transferable. This means that it is difficult to use all our allocation in each 6 month period.]
MCC Fujitsu
Status: spare capacity, decommissioning date March 1997
Allocation 7.95-10.95 1 hour pump priming (3.96-3.97 1000 hours)* [* allocation applied for but not yet in place]
From these figures we can see that we have a very limited amount of HPC allocation left, especially on the RAL YMP8. We have a large number of registered UGAMP users and so we will need to introduce quotas in order to ensure that each UGAMP group has a fair share of the remaining resources. These quotas will be put into place after the UGAMP PI's meeting at the beginning of October. In the mean time we must conserve our HPC allocation on the RAL YMP8 as many of our codes are still Cray specific codes and so can not be transferred to ULCC or MCC.
Please, take steps now before it is too late
- Use the UGAMP YMP8 allocation for UGAMP work only. If you have other HPC grants then you must use these instead of the UGAMP allocation.
- Use the other National HPC machines where possible and leave the YMP8 for those codes that only run on Crays.
- Delay any experiments that can wait until we have weathered the current crisis.
- Think before you submit a job to the YMP8!
If we have such limited HPC resources we should look at what models UGAMP uses and what HPC resources they currently require and what they might require in the future. UGAMP models used at the moment are:
Atmospheric Models: UGCM/USMM, IFS, UM, Single column
Simple Diagnostic Programs: UMAP/SMAP, Ferret
Chemical/Transport Models: TOMCAT, SLIMCAT, TOPCAT
Coupled Models: IFS+OASIS+MOMA, UM
Others: Contour advection
What are the computing issues that affect the resources these models need?
a) Portability
At the moment most of the models we run use Cray specific code. However the atmospheric (and hopefully the coupled and chemical/transport) models will be more or less portable by 1996, apart from UGCM/USMM which will need some work to make it portable. These models should be able to use massively parallel computers like the Cray T3D, conventional vector machines as well as workstations. This portability may mean that the codes are more demanding in terms of memory and CPU requirements.
Different models have taken different solutions to achieve portability. Some will use Fortran 90, which is not yet fully implemented on all machines from the T3D to the Sun on the desk. Parallel versions of these codes will use different message passing mechanisms and libraries. Full portability will take some time and so we may have to live with slightly different versions on different computers.
b) Code Maintenance
If we are running different versions of the same models on different computers then code maintenance will be an important issue. Also, it will be important to have interchangeable versions of any model code modifications. Ideally we need the same code maintenance tool running on each of the computers we use. At the moment the only suitable tool which satisfies this criterion is NUPDATE, which Cray have released into the public domain. While it is not the best code maintenance tool it is one which we know and I hope that it will tide us over until we have found something better.
c) Data Handling
Most of the models and diagnostics programs we run need large amounts of disk space for the data and, for most experiments, we need to be able to archive this data. The volumes of data produced by the models and subsequently used by the diagnostics program can be very large. If the models are run on different computers to the diagnostics programs then any data storage facility must be accessible from each of those computers as the data sizes could be large enough to make it impractical to transfer the data. We have reached the stage with experiments using the simple atmospheric models on the EPCC T3D, where experiments are diagnosed on the RAL YMP8, where it is becoming too difficult to transfer the data to RAL.
With all these points in mind what can we expect to help solve our current allocation crisis and what can we expect after March 1997?
Currently government money for the National HPC program is top-sliced from all the research councils budgets and this money is managed by the EPSRC to provide our current HPC resources. However the government want to change this system and discussions have been held with the research councils to devise a new scheme. Now the government will give each research council a computing budget leaving them to decide whether this should be pooled to support a National HPC program or distributed to their research projects to provide local resources.
NERC are discussing with their large computer users (that includes UGAMP!) the division of their computer budget (size as yet unknown!). They want to ensure that
- there are adequate top end HPC resources, for example, on which we can run our coupled models and our high resolution atmospheric and chemical/ transport models.
- there is access to mid-range HPC resources, for example, on which we could run our lower resolution models and diagnostic programs as well as use as a development/testbed system.
- there is adequate large capacity storage for both short term and long term data storage.
- there are suitable local facilities for visualisation for example.
The details of the new NERC HPC strategy are not yet known. As new HPC procurements take typically some 18 months from start to installation we know that there will not be a new National HPC computer before the beginning of 1997. If NERC does manage to allocate funds for each of the above facilities then I would hope that mid-range HPC computers, which could be procured more quickly, may provide UGAMP with the HPC resources that could solve our current crisis as well as supplement the new National HPC computers when they are in place.
Lois Steenman-Clark, UGAMP Supercomputing Coordinator
The RAL IBM front-end has an accounting system, oats, that can be used to monitor the amount of CPU time used by all UGAMP Cray users. With Lois' help, I have used this to keep track of the usage of all UGAMP groups and our group in particular.
I maintain a WWW page, updated weekly, showing the past weekly usage of all the different UGAMP groups using a program provided by Lois. Everyone is free to check this page at: http://www.atm.ch.cam.ac.uk/~glenn/cray_usage/ugamp_usage.html.
There are two graphs. The first shows
the total amount of CPU time used by
UGAMP for each week during the
period 4th June to 3rd Sept. The date
on the x-axis is the total CPU time in
hours used for the week ending on that
date.
The second group of four graphs shows the weekly usage for each of the UGAMP groups as classified above.
The lines on the graph correspond to different UGAMP groups:
Glenn Carver, ACMSU
The way that tapes are handled/distributed has changed at RAL. First and foremost users now need the environment variable TAPE_OWNER to be set to UGAMP so that they can use UGAMP tapes. If this variable is not set then you will not be able to see any of UGAMP's tapes so see your local system admin. person or email me for details of how to do this if you don't know how to.
In the new system the IBM is no longer used in the process of handling tapes and Lois and myself have been given the ability to create tapes and sets of rules relating to those tapes. Tapes can be issued in two sizes generally, 256Mb and 1Gb. The 256 Mb tapes are the tapesize that RAL recommends that we use as they optimise the throughput of the tape silo. In the first instance please contact Andy if you need tapes or have any problems related to them.
Use the tape command as before to read/write your tapes.
Usually the user now has two rules that are applicable to their tapes:
ug%w$LOGNAME to enable write access
ug%r$LOGNAME to disable write access
To write protect a tape use
ds set -v <number> -r ug%r$LOGNAME
and to list tapes that are write protected use
ds query -r ug%r$LOGNAME
To enable writing to a tape again simply replace the ug%r$LOGNAME with ug%w$LOGNAME.
The tape command works over the internet using VTP, a protocol designed at RAL. RAL have released the software for the tape command so that users can read/write data from/to their tapes from their local machines if they are unix boxes (VAXes as well if you like that sort of thing...) I have this working on the Reading Suns under Solaris 2.4 and have found it very useful. If you are interested in reading/writing tapes from your own machines I have to create a special rule for you.
Again, if you have any problems with your tapes or any queries related to them please contact me.
Andy Heaps (andy@met.rdg.ac.uk)
Table of Contents * Previous Chapter * Next Chapter