Welcome to The Perfect Curve.

Make a plan for the system going down

simon gray 2023-08-30, 11:41:52
Read this article aloud — 370 words

So it's looking almost certain that what caused the UK National Air Traffic Services' computer system to go down for several hours was some dodgy data in an inbound flight plan.

[Mrs. Roberts receives a call from her son's school on her wireless phone. She is standing with a cup of hot coffee or tea (shown with a small line above the cup) facing a small round three-legged table to the right. The voice of the caller is indicated to come from the phone with a zigzag line.] Voice over the phone: Hi, This is your son's school. We're having some computer trouble. [In this frame-less panel Mrs. Roberts has put the cup down on the table turned facing out.] Mrs. Roberts: Oh, dear – did he break something? Voice over the phone: In a way – [Mrs. Roberts is now drinking from the cup again looking right. The table is not shown.] Voice over the phone: Did you really name your son Robert'); DROP TABLE Students;-- ? Mrs. Roberts: Oh, yes. Little Bobby Tables, we call him. [Mrs. Roberts holds the cup down.] Voice over the phone: Well, we've lost this year's student records. I hope you're happy. Mrs. Roberts: And I hope you've learned to sanitize your database inputs.

When the GDS Service Standard was first launched, Point 11 in the acceptance criteria was to Make a plan for being offline; for reasons I don't know or understand, when the Service Standard was revised in 2019 this component was removed.

Whilst there has been quite a lot of very expensive disruption caused by the NATS outage, the airspace around the UK did not completely and catastrophically collapse entirely, because NATS by default has a plan for being offline anyway - they go to manual.

Many organisations may well have 'go to manual' specified in their Business Continuity Plan for when The System goes down. Many organisations will state 'go to manual' and nothing else. This is not a plan for being offline! A plan for being offline should not just specify what you're going to do, but also how you're going to do it. Will you have a manual system - whether that's literally pieces of paper, or Excel documents with appropriate column headings - there ready and waiting to instantly switch to? Does everybody know how to use the backup system? Do you run regular exercises to check everybody knows how to use it? Is your main system data-import-friendly so that when it's back online again it won't be too traumatic to import the data gathered offline? All these considerations should be specified in the plan for being offline beyond just the three words of 'go to manual' or whatever.

I'd say this kind of business continuity planning should not be an afterthought if it's included at all, but rather should be inherent in the requirements and service design of any online service from the outset. And if you've got services which don't have an adequate plan for them being offline, there's always scope to start doing that thinking now.

#ServiceDesign #BusinessContinuity #LocalGovDigital #Manifesto

In group Public / Third Sector Digital

Brought to you by simon gray. Also find me on Mastodon

The code behind this site is a bit of an abandoned project; I originally had lofty ambitions of it being the start of a competitor for Twitter and Facebook, allowing other people to also use it turning it into a bit of a social network. Needless to say I got so far with it and thought who did I think I was! Bits of it don't work as well as I'd like it to work - at some point I'm going to return to it and do a complete rebuild according to modern standards.