Disaster Recovery and Beekeeping

I lost my bees last year... don't lose your data.

In case you don’t know, I’m a backyard beekeeper. Last fall/winter I lost my hive (they left - absconded). This year I plan on getting two hives as a disaster recovery / failover plan. Some lessons learned that can help in your cloud / SaaS based business.

Bees are very sensitive and have enemies from all different directions, from mites, beetles, ant, and wasps, to the elements and even unknown reasons we’re still not aware of.

To reduce the impact of any of these issues, my plan is to go with two hives this year. That way if one hive is weak for any reason, I can utilize the second hive to help repair that initial hive. This removes as single point of failure and provides… Failover!

Many companies don't consider the possible single points of failure in their ecosystem. This is especially the case with the complete reliance of 3rd party SaaS apps. Entire companies these days are built in the cloud, but there is sometimes a overreliance on any one component in the ecosystem.

Here are some things to consider that I've run into recently.

  • DynamoDB Tables

    • Is point in time recovery enabled? What about back ups after 35 days? What if someone disables backups?

  • S3 Buckets

    • Is versioning enabled? Do you have replication to another account for DR, backup purposes?

  • Github Repos

    • What if your repose are deleted? Do you have a backup somewhere?

  • SaaS Application X that’s core to your ecosystem

    • If the data stored with the app is no longer available for whatever reason, do you need it? Can you replace it from somewhere?

  • Tribal Knowledge

    • That one engineer that knows where everything is. What happens if they’re not available for whatever reason? (Automate and Document)

I might have mentioned it before, but going through some threat modeling exercises and / or table top exercises will help reveal cracks in your ecosystem that are not apparent at face value.

Take care,

Ayman