Admin Posted January 17, 2013 Posted January 17, 2013 Hi All Well, we had a bit of a dilemma for about 14 hours...11:30pm Wed night to now (Thur afternoon) with the site being out of action for that period. For the non-technical, we had a problem with the format of the data in the database that suddenly happened and all levels of support were not able to fix it until we got top level technicians in. We have our own dedicated server, that is it is OUR server and it is not shared with anyone else...it's ALL ours (and costly). So because I have to sleep sometimes and I don't have full server admin knowledge, I pay a 3rd party company to manage and monitor the server for us...they check the server is still up and running every 15mins. I have been getting some intermittent database errors of late and not being able to find the root cause after several weeks of investigation, I had the admin company have a look at it. For some reason they knocked over a glass or something when they were in there and all hell let loose. Anyway, they have now fixed it and we are up and running. For the technical minded...Our server comprises of 8 processor cores using Intel e3 1230 processors, 16gig of ram, 4x73gb 15k rpm sas hard disks set in RAID10 and 1 x 320gb sata II hard disk that we use for daily, weekly and monthly backups. Loaded on to the server is MySQL that is used for many databases over many sites. We use innodb tables in MySQL and not myISAM. What happened is for some reason the innodb plugin for MySQL wouldn't fire up and level 1 and 2 support tried and tried to fix it but couldn't, so it was bumped up to their CTO (Chief Technical Officer) and I had to wait for him to be free to fix it. We started investigating and rebooted the server and working in conjunction with the CTO in India and the Data Centre in the US, it looked like we had a partial disk corruption that was causing havoc with many of the Services and other things. We had to completely clean out the backup disk and remount it, copy data across to it, reset and clean up the array and get mysql going properly and all other services, move the data back to the array and set the backup disk up again. Touch wood it's all fixed now but a lot of investigation will now need to be done to ascertain why innodb failed and what led up to its failure...was it just some random failure, was it the commencement of a larger potential future disk failure, is there a config issue with MySQL or RAID or PHP or WHM or cPanel or the OS causing corruptions, are services terminating cleanly etc...there seems to be many reports coming out about issues with innodb failing when restarting MySQL etc but be assured there will be extensive analysis on what has been the cause of this specific failure. Sorry everyone about the inconvenience...never a dull moment when you have a highly hammered interactive web sites hosted on a server in a data centre on the West Coast USA and a Server Admin team based in India and me sitting here in 39 degree heat at my little old desk typing this hoping that when I click the submit button we don't have world war 3 happening on the server again
storchy neil Posted January 17, 2013 Posted January 17, 2013 :ban me please:after all that it was only buggard :augie:neil 3
Admin Posted January 17, 2013 Author Posted January 17, 2013 :ban me please:after all that it was only buggard :augie:neil Now listen here Neil, you old bugger, it surely was buggered for a while...to put it mildly but it is back going again...will take a bigger buggerer then that to hold this bugger back
damkia Posted January 17, 2013 Posted January 17, 2013 I thought that someone may have "taken offence" and had the site taken down. When it disappeared off the DNS servers I was convinced. Thankfully, "IT LIVES!!!!!"" (apologies to Mary Shelley)
planedriver Posted January 17, 2013 Posted January 17, 2013 Well done Ian. I had every faith in you, when I realised the site was up the sierra hotel india tango Don't take offence, advice, or betting tips from that Neil:no no no:, he's been know to stir the pot a bit before today:stirrer:.
Guernsey Posted January 17, 2013 Posted January 17, 2013 Now listen here Neil, you old bugger, it surely was buggered for a while...to put it mildly but it is back going again...will take a bigger buggerer then that to hold this bugger back It is a bit of a bugger when the bigger buggerer is buggered thro' overwork, I'm buggered if I would like such a bugger of a job so I'll bugger off. Alan the 'old bugger'.
storchy neil Posted January 17, 2013 Posted January 17, 2013 he's been know to stir the pot a bit before today:stirrer:.who me neil
pylon500 Posted January 17, 2013 Posted January 17, 2013 OK, can't resist myself, just have to post this, even though everyone has probably seen it..... 2
Captain Posted January 18, 2013 Posted January 18, 2013 Ian, Wouldn't it have been easier to just turn it off, wait 30 secs, then back on again? Hope that helps. Regards Geoff
Admin Posted January 18, 2013 Author Posted January 18, 2013 Ian,Wouldn't it have been easier to just turn it off, wait 30 secs, then back on again? Hope that helps. Regards Geoff Tried that Cap't...even went and made a bloody coffee, pulled my hair out, accused everyone everything under the sun, came back and it was still down...happened again this morning for a completely different reason, this time the Data Centre says it needed a driver update WOW, you can't imagine my fury when after 3.5 hours of going back and forth between the Data Centre and my little Indian mates, who administer the server from India, that the problem was fixed within seconds. At one stage I was being told that the network cable wasn't plugged into the server, was even presented with a screen shot report saying this. Anyway I have just sent emails to both the admin people and the Data Centre for a please explain how these two issues have happened, why have they happened, why take so long to fix, why say they are working on it yet I know by the security system that they haven't even logged into the server, why lead me up the garden path on solutions that have nothing to do with the issue and bloody well why we should stay with both of them. I am paying $240 USD a month to them to get it right so give me a "PLEASE EXPLAIN!" This site is so busy that a normal basic hosting account will destroy the site (have you seen that sometimes there are 150 concurrent users all hitting the site at the same time) so I have no choice but to play in the big leagues with it and that brings its associated issues.
Captain Posted January 18, 2013 Posted January 18, 2013 Thanks Ian, If you need any more technical assistance, I'm just a phone or Skype call away. Now I feel bad about being flippant, .......... and I must add that when the site was down I realised (again) how often I click on and how much I enjoy it. And it has also become clear to many that without this site, RAA Members wouldn't have a clue, and would have no way to become clueful, about the issues plagueing RAA at the moment. So keep up the good work and I'll come back to you for a Goldie Membership again. Regards Geoff 4
fly_tornado Posted January 18, 2013 Posted January 18, 2013 seriously, you should switch to a generic install of centos, its stable as and bullet proof. easy as to administer. 150 users isn't a lot.
DGL Fox Posted January 18, 2013 Posted January 18, 2013 Yes you can certainly sack them but then you have all the time and trouble of getting someone else and setting it up again..bah blah.. blah....technology is wonderful when it's working but extremely frustrating when it's not and a lot worse when you are paying good money for it to be run properly...I hope you get it all sorted out without to much loss of hair Ian. David
Guest Maj Millard Posted January 19, 2013 Posted January 19, 2013 Well done Ian...I'll also have a gold membership coming your way shortly..............PS..have you seen that photo showing the power and telephone overhead lines in India ???................................Maj...
Admin Posted January 19, 2013 Author Posted January 19, 2013 Thanks Ian,If you need any more technical assistance, I'm just a phone or Skype call away. Now I feel bad about being flippant, .......... and I must add that when the site was down I realised (again) how often I click on and how much I enjoy it. And it has also become clear to many that without this site, RAA Members wouldn't have a clue, and would have no way to become clueful, about the issues plagueing RAA at the moment. So keep up the good work and I'll come back to you for a Goldie Membership again. Regards Geoff Thanks Mate...always remember that this site is unbiased and gives you the freedoms to say and discuss what you feel without the worry of RAAus or CASA breathing down your throats...
nomadpete Posted January 22, 2013 Posted January 22, 2013 Freedom of speech means that you must allow lots of biassed views to be aired and the reader must be remember to fit his/her sh*t filters prior to taking any views as being truth. And sometime wear a suit of armour. So, I'd say, thanks Ian for all your efforts that allow freedom of speech to our community. I too, was worried when the site went down, and relieved that it was 'only' an IT problem. Incidentally you were not the only aviation websit down at that time.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now