Cloud Security Management, When Things Break

You will find cloud solutions for just about everything today. You can do everything from hosting your website in the cloud, to your entire…

Cloud Security Management, When Things Break

You will find cloud solutions for just about everything today. You can do everything from hosting your website in the cloud, to your entire infrastructure. Cost savings can be found by many organizations, not just in the hardware/software maintenance; but also in the employees to manage and the capital expenditure to house the data centers. When it comes down to hosting security tools, I’m talking about in-house and COTS (commercial off-the-self). However, I’m going to tell you about the time I wrote software that relied on the cloud, why I did that, and why I changed it. I also can’t talk about the security tool, so I’ll gloss over a lot of details.

Why the cloud?

When I first wrote my tool and provided others access to it, there was no automatic updating — they could update when they wanted to. Not too long after, I ran into an issue in which people weren’t updating the tool at all. The tool ran fine, so they wouldn’t update. Then back-end changes would happen that the tool relied on and things would either be buggy or break. Of course, people weren’t happy, they then assumed it was bad coding. Of course, the small group of people that did update didn’t have issues.

I then switched over to forced updates, no more having stragglers. This worked great. The tool would check for an update on startup, then a periodic pooling to see if there’s a new version. If a new version is found, the tool then prevents any new actions — they can only complete what they are already working on. This was a good compromise, not quiet Windows 10 level, but it still forces them to update. Then I started doing more frequent updates, this then caused people to have to restart the tool more. Not that it was that much work, the binary downloads the new version, renamed the old one, restarted itself and deletes the old one as long as basic checks pass.

I’m givin’ her all she’s got, Captain

I then wanted live updates, in which some code changes required a restart and others just updated in-place. Because some of the code was JavaScript, an interpreted language, the JavaScript code would check a URL for versioning information. If there was a newer version, the JS code would grab the latest JS code and replace the current functions in memory. If there’s an application version update, then that requires a restart of the application, and the application would prompt the user while locking future searches. That was done to incentivize users to update, and to prevent outdated code from potentially changing backend.

And it’s broken

Well, things were working great. In the back of my head, I knew I needed to account for when the cloud didn’t work; however, things worked and so I wanted to continue working on other projects. Well, there was a situation in which the cloud connectivity stopped working, as clouds do from time to time. Then the application didn’t work. When the application started up, it retrieved data required for the application to run… so the application couldn’t start.

The fix

I needed the ability to have all these nice features, like live updates, while also allowing the application to work with no cloud connectivity. Luckily the fix was simple, fallback to a local version. When the application starts, it checks for cloud connections. If cloud connectivity can’t be established, then it falls back to a local copy. Else, it uses the cloud connections to use the newer version of code. Luckily the backend connections were developed with modularity. Meaning the tool collected as much information from the backends as possible, so if any of them went down, the data found could still be used.

The fallback allows me to continue to push faster code changes to the cloud, while allowing the local version to be a fallback if anything goes wrong. Kind of like a known good state. Then the local version on everyone’s machine could be updated at a slower pace.