Friday, February 27, 2009

Fud Buster Friday #29 - One Outage isn't so bad.. is it?

Naturally it depends on who has the outage and how long it lasts and well, a million other things.

As we know Google had an outage this week. Ho hum. Welcome to IT guys. The world keeps spinning, we get blamed, we fix it, life moves on. Or does it?

Just spent over a week on a server crashing daily problem, because of my laziness and lack of attention to the clients plight. Well, to be fair, I was at 2 new clients this week, plus demoing for a 3rd and rewiring my office but the bottom line is I took a casual liberty attitude with a clients business and that's not good.

The client is extremely happy I have resolved the problem, will wait 72 hours before total acceptance.

But this taught me, and should teach you too, that one outage is too many for ANY client. And repeated ones spell disaster for any IT credit or expectations and it is VERY hard to recover from this and in some cases leads the customer to think about alternative email solutions.

The client had turned down clustering, even a backup replicated server(I am sure some PC could have been found but I digress). Their backup procedures are appalling and they have no idea why they were running LDAP, POP3,DECS, DIIOP or IMAP on the server. (Note to jr admins, just because you see a check box does NOT mean it must be checked.)

The server was throwing out the usual errors about calendar profiles, users not found in the NAB and a bunch of agent signing errors, which happen to run their apps but signed by a previous admin/developer, of course. And they had small quotas like 150MB. And of course you can guess which mail files had these errors. Funny how nonexistent people can fill up a mail file,waste disk space, routing, backup time and space and even have agents running! Also, if you have a donotreply@corp.com address, be prepared to have many replies to it. The fix is to just have an entry in the NAB and a mail file for it that purges every day or 7 or whatever you want. Otherwise, like the client, you could have 100's of dead mails in your 4 mail.box files.

Suggested to the client to take care of these, only to realize they could have been all along and never did. So as any good consultant would, we billed them for the mini clean up and are planning on a bigger clean up under a new project.

I saw the errors upfront, but we were working on a (thought we were)bigger problem and could fix these later. Remember, Eat the Frog, do the stuff you don't like or want to first and move forward. Taking the easy way out will never help you as an admin.

The other thing is delegate. It was bumped to me as others had tried to tackle it already. I was not happy none of them had resolved the errors or the small things.

Training is important, but walk the talk. As the Worst Practices session at LS08 showed, even the great ones don't always follow their own guidelines.

So I apologized to the client and oddly enough they were just happy anyone could resolve the problem. It has been a month or so off and on.

The downside is their clients just expect the server to be down and even if it stays up won't care. The worst part, for me, was after it all, they tell me they are going to move off Domino for mail. Working on that problem next and may blog it if it gets interesting.

4 comments:

  1. Very timely for us. We're going through the same sort of thing. I'm right now sorting through a list of 45 tickets over the past 4 years related to blackberry handhelds. What's the fix for the caledar profile issue? That's been annoying me for a while.

    ReplyDelete
  2. Keith,

    In general the calendar profile error is because an Admin opened the mail file or checked on calendar preferences and now the client's name is no longer registered to the user.
    Will blog the options to resolve it.
    Email me if you have more specific ones too.

    ReplyDelete
  3. Wait..

    they had a poorly configured server and are switching platforms because of it?

    I teach Exchange and I always try to stress that small issues matter. End users cries for help are important, and even if you are somehow restricted from fixing them due to corporate policy, at least have a dialogue with the users.
    Also, don't ignore errors and warnings you don't know what means for ages. You should know why you get errors and warnings.

    This applies to any platform or solution.

    ReplyDelete
  4. @3 No, I never said they were switching platforms because of a poorly configured server.
    That was laid on me after I was working on it.
    I guess they figured like a car on lease return they would try to get by without any damage.

    Your thoughts I concur with about clients problems and discussing them. and yes, it's the small things which do cause the biggest problems usually. like misplaced periods or a n extra blank space in an email name.

    ReplyDelete