Now as you probably already know, I’m not an experienced incident responder. Sure it interests me, but until recently I’ve never really had the cause, or chance, to be in the front-line handling a live incident. I know, I know, I must be the only person in IT Security that DIDN’T get into it because they got hacked. Anyway, I’m getting off-topic.
Back on 28th August the Apache Software Foundation suffered a hack. Most people know this already, it’s not front-page news. Why isn’t it front-page news, because of their communications. Almost from the moment they took down their websites they had an open dialog with the public, telling everything they knew about the problem. Even thought their services were unavailable, and even though, as a group it was embarrassing to suffer this kind of attack, they still did the right thing in communicating the full information to everybody who needed or wanted to know. As a result I’ve seen nothing but praise for the way the Apache Software Foundation and it’s engineers dealt with the hack and recovered. Not only did they put ALL the information out their, they also were good enough to put an open review of what worked and didn’t work during their own incident response. Nobody could have asked for more. As a result not only the Apache Software Foundation has benefited from reviewing their IR process. Everybody with an interest can review their notes and adapt their own in-house IR plans to avoid the problems Apache saw. Sure, we all wish they didn’t get hacked, but considering the issues you could almost say it was a good PR exercise for them. Although it would have been nice to see a post on their official Twitter account as well. With a follower list more than 1,300 strong, and a presence (although a little thin on the ground) since last December, it would have been a good aditional source of micro-updates. Still we can’t have it all can we 😉
In contrast we have COLT Telecom, a European telecom provider that suffered what appears (at this time) to be a full outage of services at 12:00:03 on the 8th September. Just before 8pm that night (after 20 hours of downtime with no communications with customers) COLT registered a Twitter account called COLToutagenews and posted the following “We’ve network issues in 8 cities. We are going through systematic process with urgency to identify, resolve & minimise customer impact”. This falls directly into something Martin McKeay covered in his FIRST 2009 presentation “Using Social Media in Incident Response“. This Twitter account was setup by COLT to serve as the ONLY communication with it’s customers after the event had already happened. I’m sure you can see the problem with this. To this date the highest number of followers for this temporary account is 83.
That’s not many considering their network runs European-wide and the outage effected 8 cities. Does that make Twitter a bad communications medium for Incident Response ? No. Not really. However you can’t expect your customers to go to Twitter and search on new accounts to find your communications channel once you get around to setting one up. Social networks like Twitter will only work in this way if your company invests in a presence 24/7 and not just when the need serves the company. If people aren’t following your company account because it doesn’t exist, is nothing but 100 marketing messages, or never posts anything, then your incident response isn’t going to work as you imagined it. From the looks of things COLT are back up and running (at least from the feedback I’ve seen). However the reason for the outage, and any communication with customers still hasn’t happened. Even the link provided by the COLToutagenews twitter account points to nothing but a list of contact phone numbers.
How can you even start to compare these outages. The services these companies offer are very different, and I feel bad comparing them as equals (apples and oranges). I’ve also no doubt that a lot of good people worked (and are probably still working) around the clock on the COLT network issues. However the fault isn’t with the hack, or whatever has caused COLT’s downtime. The big issue here is the response, communication and open flow of information.
We can learn a lot from these 2 examples. I can see these issues appearing in IR training courses for a long while to come.