System Notices

FASTER and Grace Cluster Maintenance, March 10-13 — UPDATED

UPDATE: (03/24/2025 11:37a):

The Grace OOD portal is available but we are working on restoring various portal apps.

UPDATE: (03/20/2025 11:36a):

We are currently working on reinstalling many software packages. Please submit your software requests to help@hprc.tamu.edu until the OOD portal is online.

The Grace OOD and Galaxy portals will be made available at a later date.

UPDATE: (03/17/2025 11:35a):

The Grace cluster is now available. Since we are still reinstalling various software packages in a new software tree, all pending jobs before the maintenance were removed. We expect that your job scripts may need updates to use newer software and/or toolchain versions.

The Grace OOD and Galaxy portals will be made available at a later date. Please submit your software requests to help@hprc.tamu.edu until the OOD portal is online.

Our apologies for the inconvenience and the extended downtime to redeploy the Grace cluster.

UPDATE: (03/15/2025 4:36p):

The FASTER cluster is currently available with about 80% of its compute nodes. We are still investigating issues with FASTER's OOD portals.

The Grace cluster redeployment is still in progress.

UPDATE: (03/14/2025 11:55p):

UPDATE 11:55p March 14: The FASTER cluster may be available tomorrow morning at 75% capacity after testing overnight. Some GPU nodes will remain offline due to composability fabric issues that will be remediated next week.

The Grace cluster remains unavailable as its redeployment with a new OS is taking much longer than anticipated. We will continue working through the weekend to complete the remaining maintenance to make the Grace cluster ASAP.

UPDATE: (03/13/2025 10:06pm):

The maintenance for the shared storage and the Liqid composability fabrics were completed successfully but took more time than anticipated. A failed disk (which needed replacement) contributed delays to the shared storage maintenance. We will provide more updates as we continue work on the FASTER and Grace cluster maintenance.

Posted at 03/04/2025 10:28a

The FASTER and Grace clusters will be unavailable from 9am March 10 to 8pm March 13. Software maintenance will be done for FASTER's nodes and the Liqid fabrics. The Grace cluster will be redeployed to the same OS (RHEL 8.10) as FASTER. The software on the shared storage will be updated as well.