Blech - Outage Update

Hang in there, we’ve had some sort of weird failure and a couple of corrupted backups - things are spotty right now and I’m trying to figure out what is up - but I did have to roll back to a backup a couple of days old which means we may have lost a few posts. Don’t worry, I’ll be happy to recap my political whining.

Stay tuned. Is this the part where I blame 2020?

I’m warning you–if any of my posts are lost I’m going to have a field day about it when Festivus rolls around.

(Seriously, thanks for fixing this. And yes, it’s a perfect 2020 thing.)

So, that was part of the 504 gateway error? Sorry that it happened. I’m glad you guys know what you’re doing. I sure don’t know much about the software side of things.

That’s a vast overstatement.

1 Like

Damn! Those were my most profound posts ever. :stuck_out_tongue_winking_eye:

1 Like

It actually looks like I caused a screw up because I misidentified a DoS attack, but things cascaded from there.

I recently did an update that was problematic and was causing the 502 and 504 errors and putting big strains on the server. This was a week or two ago, but I got it fixed. Then I did what I thought was a minor upgrade yesterday, and then we had the problems of the evening, which without looking under the hood looked eerily similar to the problems I had a week or two ago.

Because I have been pretty slammed with work I assumed it was that minor upgrade resurrecting the past problem, so without investigating further I rolled back to a backup - but that backup was corrupted. It wasn’t until I got to a 3 days old backup that I discovered a backup that was valid (I do them daily).

But while I was going through that process I thought I had nuked the entire site, which was not a happy feeling - I have some older server images so I guess all was not lost, but at 3am it sure felt that way.

So, once I got the three day backup restored, the 504 errors began, server load was REALLY high and so I investigated further. Tons of hits on the site from Russia. I guess all of our DJT jokes caused some blowback.

So… I could probably retrieve the past couple of days posts with a little work, but honestly don’t think it is worth it given the pay. I’ve also done a few things I should have done from the get-go but didn’t because this is mostly a hobby to me, although I feel an obligation to keep things running here smoothly to all you donors.

I’m running our domain through Cloudflare now to help with attacks - I should have done that from the get-go. I’m modifying the backup frequency and where they are stored and will run tests on the backups periodically. I’ll also schedule some server images as a backup to the backup through Amazon Web Services. I’ll also not rush to conclusions in the future and look at the logs before I take action.

In short, a comedy of errors really that in the professional world might have resulted in my termination (which is why I typically leave the development to my developers) but I’ll do better next time. My apologies.

6 Likes

The shutdown here forced me to have to visit Utehub yesterday to see if there was any discussion of the hoops game, which was highly traumatic. I hope that doesn’t happen again.

2 Likes

You can’t taste the sweet were it not for the bitter…

3 Likes

@RockerUte Do you need a donation to help out? I am sure the subscriptions for all these hardware and software solutions runs a bit of coin. Happy to contribute if you need.

2 Likes

Is it a good thing, this is not the Apprentice. :smiley: