W-30 Server Website Problems – 29 April

Websites on server W-30 in Cape Town experienced loaded slow and gave unexpected errors for most of the day. The problem is now resolved.

01:00 UTC Original report
Apache connections are not releasing, causing all available servers to become full and incoming requests failing.

05:15 UTC update
A couple of configuration changes later, all seems to be stable again. We are seeing a higher than using volume of WordPress XML-RPC and wp-login attacks, likely explaining why the initial problem.

05:30 UTC update
The attack on WordPress websites continues. ModSecurity and our firewall are doing their jobs to keep the attacks at bay, but the Apache web server is struggling to keep up. To prevent attacked scripts from holding Apache connections open for extended periods, we have reduced the maximum execution time for PHP scripts. This is helping with handling the attacks, but with the server still under higher than usual load this instances of innocent scripts to fail as well. Please bear with us.

14:24 UTC update
After hours (literally) of software debugging, we are now shifting our focus to the server hardware. There is evidence suggesting that we may be dealing with a faulty memory chip. We have requesting data centre technicians to have a look at the server hardware. The server will be taking offline soon; exact time not known yet. Please bear with us a bit longer.

16:40 UTC update
Server has new RAM and we have rolled back the (many!) software configurations we tried. All is back to normal and we hope it stays that way.

A word of re-assurance: We now realise that the signs of WordPress attacks were symptoms of the problem, rather that the whole cause of the problem. With Apache not able to handle its requests in a timely fashion, a lot of these malicious requests became visible. But all along our web application firewall (ModSecurity) and the network firewall were working to fend off attacks and block the bad guys (botnets).