Jump to content

Recommended Posts

Posted

Hi All

 

Well, we had a bit of a dilemma for about 14 hours...11:30pm Wed night to now (Thur afternoon) with the site being out of action for that period.

 

For the non-technical, we had a problem with the format of the data in the database that suddenly happened and all levels of support were not able to fix it until we got top level technicians in.

 

We have our own dedicated server, that is it is OUR server and it is not shared with anyone else...it's ALL ours (and costly). So because I have to sleep sometimes and I don't have full server admin knowledge, I pay a 3rd party company to manage and monitor the server for us...they check the server is still up and running every 15mins. I have been getting some intermittent database errors of late and not being able to find the root cause after several weeks of investigation, I had the admin company have a look at it. For some reason they knocked over a glass or something when they were in there and all hell let loose. Anyway, they have now fixed it and we are up and running.

 

For the technical minded...Our server comprises of 8 processor cores using Intel e3 1230 processors, 16gig of ram, 4x73gb 15k rpm sas hard disks set in RAID10 and 1 x 320gb sata II hard disk that we use for daily, weekly and monthly backups. Loaded on to the server is MySQL that is used for many databases over many sites. We use innodb tables in MySQL and not myISAM. What happened is for some reason the innodb plugin for MySQL wouldn't fire up and level 1 and 2 support tried and tried to fix it but couldn't, so it was bumped up to their CTO (Chief Technical Officer) and I had to wait for him to be free to fix it.

 

We started investigating and rebooted the server and working in conjunction with the CTO in India and the Data Centre in the US, it looked like we had a partial disk corruption that was causing havoc with many of the Services and other things. We had to completely clean out the backup disk and remount it, copy data across to it, reset and clean up the array and get mysql going properly and all other services, move the data back to the array and set the backup disk up again.

 

Touch wood it's all fixed now but a lot of investigation will now need to be done to ascertain why innodb failed and what led up to its failure...was it just some random failure, was it the commencement of a larger potential future disk failure, is there a config issue with MySQL or RAID or PHP or WHM or cPanel or the OS causing corruptions, are services terminating cleanly etc...there seems to be many reports coming out about issues with innodb failing when restarting MySQL etc but be assured there will be extensive analysis on what has been the cause of this specific failure.

 

Sorry everyone about the inconvenience...never a dull moment when you have a highly hammered interactive web sites hosted on a server in a data centre on the West Coast USA and a Server Admin team based in India and me sitting here in 39 degree heat at my little old desk typing this hoping that when I click the submit button we don't have world war 3 happening on the server again 078_pc_revenge.gif.92f2d38a0e662b2e0b6cba4dc0ba5c35.gif

 

 

Posted
:ban me please:after all that it was only buggard :augie:neil

Now listen here Neil, you old bugger, it surely was buggered for a while...to put it mildly but it is back going again...will take a bigger buggerer then that to hold this bugger back

 

 

Posted

I thought that someone may have "taken offence" and had the site taken down. When it disappeared off the DNS servers I was convinced.

 

Thankfully, "IT LIVES!!!!!"" (apologies to Mary Shelley)

 

 

Posted

Well done Ian. I had every faith in you, when I realised the site was up the sierra hotel india tango

 

Don't take offence, advice, or betting tips from that Neil:no no no:, he's been know to stir the pot a bit before today:stirrer:.

 

 

Posted
Now listen here Neil, you old bugger, it surely was buggered for a while...to put it mildly but it is back going again...will take a bigger buggerer then that to hold this bugger back

It is a bit of a bugger when the bigger buggerer is buggered thro' overwork, I'm buggered if I would like such a bugger of a job so I'll bugger off.067_bash.gif.26fb8516c20ce4d7842b820ac15914cf.gif

 

Alan the 'old bugger'.

 

 

Posted

OK, can't resist myself, just have to post this, even though everyone has probably seen it.....

 

 

 

  • Like 2
Posted

Ian,

 

Wouldn't it have been easier to just turn it off, wait 30 secs, then back on again?

 

Hope that helps.

 

Regards Geoff

 

 

Posted
Ian,Wouldn't it have been easier to just turn it off, wait 30 secs, then back on again?

 

Hope that helps.

 

Regards Geoff

Tried that Cap't...even went and made a bloody coffee, pulled my hair out, accused everyone everything under the sun, came back and it was still down...happened again this morning for a completely different reason, this time the Data Centre says it needed a driver update WOW, you can't imagine my fury when after 3.5 hours of going back and forth between the Data Centre and my little Indian mates, who administer the server from India, that the problem was fixed within seconds.

At one stage I was being told that the network cable wasn't plugged into the server, was even presented with a screen shot report saying this.

 

Anyway I have just sent emails to both the admin people and the Data Centre for a please explain how these two issues have happened, why have they happened, why take so long to fix, why say they are working on it yet I know by the security system that they haven't even logged into the server, why lead me up the garden path on solutions that have nothing to do with the issue and bloody well why we should stay with both of them. I am paying $240 USD a month to them to get it right so give me a "PLEASE EXPLAIN!"

 

This site is so busy that a normal basic hosting account will destroy the site (have you seen that sometimes there are 150 concurrent users all hitting the site at the same time) so I have no choice but to play in the big leagues with it and that brings its associated issues.

 

 

Posted

Thanks Ian,

 

If you need any more technical assistance, I'm just a phone or Skype call away.

 

Now I feel bad about being flippant, .......... and I must add that when the site was down I realised (again) how often I click on and how much I enjoy it.

 

And it has also become clear to many that without this site, RAA Members wouldn't have a clue, and would have no way to become clueful, about the issues plagueing RAA at the moment.

 

So keep up the good work and I'll come back to you for a Goldie Membership again.

 

Regards Geoff

 

 

  • Like 4
Posted

seriously, you should switch to a generic install of centos, its stable as and bullet proof. easy as to administer. 150 users isn't a lot.

 

 

Posted

Yes you can certainly sack them but then you have all the time and trouble of getting someone else and setting it up again..bah blah.. blah....technology is wonderful when it's working but extremely frustrating when it's not and a lot worse when you are paying good money for it to be run properly...I hope you get it all sorted out without to much loss of hair Ian.

 

David

 

 

Guest Maj Millard
Posted

Well done Ian...I'll also have a gold membership coming your way shortly..............PS..have you seen that photo showing the power and telephone overhead lines in India ???................................Maj...029_crazy.gif.9816c6ae32645165a9f09f734746de5f.gif

 

 

Posted
Thanks Ian,If you need any more technical assistance, I'm just a phone or Skype call away.

 

Now I feel bad about being flippant, .......... and I must add that when the site was down I realised (again) how often I click on and how much I enjoy it.

 

And it has also become clear to many that without this site, RAA Members wouldn't have a clue, and would have no way to become clueful, about the issues plagueing RAA at the moment.

 

So keep up the good work and I'll come back to you for a Goldie Membership again.

 

Regards Geoff

Thanks Mate...always remember that this site is unbiased and gives you the freedoms to say and discuss what you feel without the worry of RAAus or CASA breathing down your throats...

 

 

Posted

Freedom of speech means that you must allow lots of biassed views to be aired and the reader must be remember to fit his/her sh*t filters prior to taking any views as being truth. And sometime wear a suit of armour.

 

So, I'd say, thanks Ian for all your efforts that allow freedom of speech to our community. I too, was worried when the site went down, and relieved that it was 'only' an IT problem. Incidentally you were not the only aviation websit down at that time.

 

 

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...