The Magic Number

The Magic Number
Photo by Shubham Dhage / Unsplash

It's hard to make compression sound cool. The closest thing was probably Pied Piper - but the recent attempted supply-chain attack against the widely used xz library will probably make one heck of a Darknet Diaries episode soon.

We're going to focus on something a little less cool: compression in Microsoft SQL server when using the COMPRESS and DECOMPRESS functions. Amtelco uses compression to save your scripts (and other things) into the database - without it, you'd have absolutely giant databases - even larger than you have now (I'm looking at you ScriptTracker events.)

We published a couple of scripts a few weeks back that allow you to search scripting and also find scripts that may be corrupted. Depending on how the SQL query optimizer decided to run, we were getting inconsistent results when attempting to decompress scripts (so we could search the contents of them) - the error was something like this:

Uncompressed or corrupted data passed as argument to DECOMPRESS builtin.

Why does this happen? I'm not really sure in the context of Amtelco environments. (Obstensibly, it's because the data in the column is corrupt or not using gzip compression. ) We were able to identify many of the broken scripts by opening them in IS Superivsor: typically re-saving or removing them from IS Supervisor would resolve most - but not all - of the broken scripts.

It could be a bunch of old scripts that are no longer supported, or that were created in a version of SQL not using gzip, or they errored when attempting to save (such as a SQL server unplanned crash.)

Regardless of why, we have to figure out how we can decompress the scripts that are good without running into an error using DECOMPRESS. The most obvious option (to me) was to DECOMPRESS the scripts that I could and maintain a searchable index. This approach was data intensive (copies of your scripts that had to be regularly polled and re-indexed) and not something I wanted to do - I needed a SQL statement  that could find things in realtime without erroring out due to weirdly compressed scripts.

In comes The Magic Number. While there are many magic numbers, this one is mine. It comes in the form of gzip compression headers, where anything that uses gzip compression will start with 0x1f8b - along with the next byte-sequence being 08 which designes the DEFLATE compression method.

As a result, we can CAST our Amtelco scripts to only pull those first 3 bytes and compare them to the gzip Magic Number - if they match, we can be reliably sure that the DECOMPRESS function will work on that script.

In practice, this check looks like the following WHERE clause:

where cast(p.script as varbinary(3)) = cast(0x1F8B08 as varbinary(3))

We tried a few other methods, including wrapping our query in a WITH statement and attempting to only pull out and operate on the records that were good. For whatever reason, I couldn't make that work as well as this method. If you're doing this - and I know a few of you are - what method are you using to reliably decompress the scripts so you can search them?

In the meantime, we've updated our documentation with our new queries, along with some other fun updates. I'll include links below.

Mission Control Documentation

I recently pushed out some updates to Mission Control, and one of those is better documention. You'll find a new section in the portal that covers everything you need to know about mission control, including:

  • Licensing and plans for Mission Control
  • Installation requirements
  • Features and how to use them
  • Support details and getting help
  • A quick getting started guide

You'll also find the aformentioned script search option built into Mission Control!

IS Scripting Sessions

We are continuing with our mostly-weekly training sessions for Intelligent Series scripting. This week we are focusing on the Intermediate II content, including Contact-Based Dispatch/Architecture. See you there!

Call Theory

The Intermediate assessment is up and running here - this is the content we base each class off of:

Call Theory

And finally, here are the links to the updated queries using gzip magic numbers:

Call Theory
Call Theory