Server WWW-19 Problem – 1 April

This is to confirm that we have been experiencing severe problems with Linux server WWW-19 today. At two occasions, once this morning (SA time) and once this afternoon, the server froze and we had to perform emergency reboots. At this time we are looking into the possibility of a hardware problem and have requested inspection by data centre technician. The current problem is unrelated to the network routing problem we experienced over the weekend.

While we are unable to provide any definitive information at this time, we want to assure you that we are doing everything we can to restore service to normal. Information on the next step will follow in due course.

Update at 14:30

The server will be taken offline for a hardware replacement at 17:00 SA time. It should be back online by 18:00. Initial performance may be a bit slow while the RAID array synchronises to the new hard drive.

Please accept our apology for the inconvenience .We trust that that things will run smoothly from this point on.

Update at 15:53

Replacement of the server hardware has been completed and all services are back to normal. We will perform system software updates (to match the new hardware) shortly and reboot the server one more time. The reboot will occur after midnight SA time to minimise further inconvenience.

Update at 18:06

We regret that our problems with Linux server WWW-19 continues. After swapping the hardware chassis, all appeared normal for about 90 minutes before the server froze again. It is unclear what lies at the root of the problem, but it most likely lies on a hardware level. We are currently swapping the chassis again and hope to have the issue resolved soon.

Update on 3 April at 05:16

We are pleased to report that all known issues with Linux server WWW-19 have now been addressed:

  • The network routing bug will be corrected with a patch shortly after midnight (SA time) on Friday morning. We expect only a short interruption in service while this is done.
  • The new server hardware is stable; it took two chassis replacements though. The RAID array was partly replaced and has rebuilt itself without errors.
  • We worked on Wednesday night to upgrade and reconfigure the SMTP and POP3/IMAP services, together with their spam and virus filtering sub-systems.

Everything is in perfect working condition at this time and we do not foresee any further problems. In the unlikely event that you encounter problems, please do not hesitate to contact us.

Note on the side: After retiring the old “POP before SMTP” fail-over authentication mechanism two week ago, we have had some complaints from clients not being able to send email. Receiving emails works fine but not sending. If you run into such an issue, please check that SMTP authentication is enabled, i.e. that the email software logs in to send email in the same way that it logs in to receive.

We regret the inconvenience the recent problems with the servers may have caused you and your clients.