So it's looking almost certain that what caused the UK National Air Traffic Services' computer system to go down for several hours was some dodgy data in an inbound flight plan.
When the GDS Service Standard was first launched, Point 11 in the acceptance criteria was to Make a plan for being offline; for reasons I don't know or understand, when the Service Standard was revised in 2019 this component was removed.
Whilst there has been quite a lot of very expensive disruption caused by the NATS outage, the airspace around the UK did not completely and catastrophically collapse entirely, because NATS by default has a plan for being offline anyway - they go to manual.
Many organisations may well have 'go to manual' specified in their Business Continuity Plan for when The System goes down. Many organisations will state 'go to manual' and nothing else. This is not a plan for being offline! A plan for being offline should not just specify what you're going to do, but also how you're going to do it. Will you have a manual system - whether that's literally pieces of paper, or Excel documents with appropriate column headings - there ready and waiting to instantly switch to? Does everybody know how to use the backup system? Do you run regular exercises to check everybody knows how to use it? Is your main system data-import-friendly so that when it's back online again it won't be too traumatic to import the data gathered offline? All these considerations should be specified in the plan for being offline beyond just the three words of 'go to manual' or whatever.
I'd say this kind of business continuity planning should not be an afterthought if it's included at all, but rather should be inherent in the requirements and service design of any online service from the outset. And if you've got services which don't have an adequate plan for them being offline, there's always scope to start doing that thinking now.