Server downtime completed
[UPDATE 30/12] The server is back to normal and operating well.
[UPDATE 0630 28/12] When you are beholden to 3rd parties you are at their mercy. Unfortunately when the data centre built the 123host server they wired the hard drives incorrectly so that a specific drive wasn’t in the reported location. This led to the drive that was reported as faulty and then replaced not being the actual faulty drive…sigh.
The wiring has been rectified, however it means that the faulty drive is still in the server and is scheduled to be replaced at 0200 on 29/12.
Hardware failures happen, but this has been an unfortunate series of events at the worst time of the year. I sincerely apologise for any hassle, hopefully the late night servers outages have minimised any impact.
[UPDATE 0510 28/12] The hard drive has been successfully replaced. Now the server has to replicate the working drive to the new one so there is a constant real time copy. This is putting a bit of a strain on the machine but it is all up and running.
[UPDATE 0630 27/12]
Sigh…It seems I was given innacurate information and the hard drive hasn’t been replaced yet. I have requested it be done at 0200 AEST tomorrow. The server will be down for approximately 30 minutes and then a little slow for the next 6 to 12 hours while all the data is replicated from the existing drive to the new one. Being the holidays makes it tricky scheduling this sort of thing due to staffing, but the upside is that it is a quiet time for server activity too.
[UPDATE 0640 26/12] The data has now copied onto the new drive and that process has completed, the server is back to full operational state.
The technician will be back to check this again tomorrow to do maintenance to ensure all is in order. This will require shutting down the server briefly, but as far as we can see the data has rebuilt onto the new drive that was put into the machine yesterday.
We’re 50% of the way there through this maintenance and all looks healthy so far.
[UPDATE 0910] You do understand that I am not in the data centre and working on this, right? I have been advised that this morning’s efforts have been concentrating on getting to the source of the issue and they have. The hard drive is now returning a status of failed but there is a second hard drive which, as I said in the original post, has everything mirrored and is designed to kick into action if the other fails – it worked! There is now a physical replacement of the failed device scheduled for 0200hrs AEST 25/12/2016
[UPDATE 0700] It hasn’t been quite as straightforward as expected, but I understand that the job has been done. The server may reboot once or twice this morning. Until this, it had been up for about 120 days straight.
I’ll leave the status as ‘underway’ until I have confirmation it is complete.
There has been some minor hiccups on the server over the last week or so. Investigations have finally revealed that a hard drive may have some problems. Of course it happens a couple of days before Christmas when staffing is limited 🙁
The hard drive is going to be replaced. This will not affect data at all, there is a mirrored hard drive that makes copies in real-time as well as nightly backups.
However it will mean up to 30 minutes of server down time. I am arranging this to happen at 0200 tonight if possible or at 0200 some other night in the next few days.
I apologise for any hassle, but it is important to do preventative maintenance.