Server W-29 Emergency Maintenance – June 9

Server will be unavailable for about one hour while we replace hardware.

Background:

The server has experienced an increasing number of outages in recent weeks. When we first noticed problems in April, we replace the memory sticks. After subsequent stability issues, we suspected system software problems, and focussed our attention on more extensive service monitoring and recovery. An outage today changed our direction again, with indications of possible CPU overheating. The data centre have also indicated a relatively high incidence of reported problems with AlmaLinux 8 (our current Linux version) and their Intel-based Dell hardware.

Solution
We have requested the data centre to move the hard drives to a brand new AMD-based server chassis. With this, we hope to solve the issues once and for all.

15:45 UTC Update
Hardware replacement will commence shortly with anticipated downtime of up to one hour.

17:10 UTC Update
The hardware replacement is complete, and all services are running normal. We hope this takes care of the issue for good.

One Response

  1. Chris Reeler
    Chris Reeler 10 June 2024 at 05:10 | | Reply

    Thanks for the updates, Stephen

Leave a Reply