It’s the data, stupid!

Continuing the Rethinking IT series, let’s take a look at data.

If one of my underlying principles is that today’s IT is yesterday’s data processing (plus communications), then data’s been fairly well acknowledged to be old hat for IT professionals, right?

We’re doing it wrong
Take a look at your document management system (that’s if you have one of those, and then only if it’s actually used) and see how much of your company’s operating data set is stored, unstructured, in Microsoft Office formats.

How well is it maintained? How does it get shared? How do updates get communicated? Are updates propagated to operational systems? Shouldn’t they be, via some workflow?

Does the status of a routine workflow require meetings to update a project manager? If so, you’re probably doing it wrong. Some of the most important data processing the IT department should be doing is processing data about what it’s doing. Activity logging is essential to knowing how your systems are operating. Are you monitoring your systems from an enterprise monitoring solution that hits a URL or runs a synthetic transaction periodically to determine system health? If you’re polling every 5 minutes, is that really good enough? All it tells you is that the system responds sufficiently to monitoring requests that occur periodically. If you’re responsible for operating a system, you shouldn’t simply be checking for the negative (failure to respond to polling requests), but validating the positive.

Data-Driven Dashboards
We need them at all levels.

Someone “on the ground” who’s going to respond to system problems must be able to positively identify how the system is performing in order to determine whether or not a problem has been corrected. I can no longer accept, “It must be working if nobody’s complaining,” and neither should you. Your front-line guys need “operational intelligence”.

Someone who’s responsible for overseeing a routine process (one for which there’s a defined workflow) should be able to see the status of that workflow and work with individuals to resolve problems, without having representatives of all groups in the workflow attending status meetings. If I had a gigabyte for every time I’ve had to “get on the same page” for a routine, predictable, data-driven process, I could… well, I could store a lot.

Of course, as it gets up the chain, the dashboard will get more focused around business KPIs, and less about technology, but it’s the collection, correlation, and analysis of business activity data that bubbles up into those metrics.

Dr. Dobb’s has a great article about greenshifting. Short and sweet of it? Greenshift is how the “color” assigned to the status of a project gets greener as it approaches the CIO. If your organization assigns colors based upon anything other than observable metrics, you’ll never know who to trust.

Which one do you mean?

IT is full of overloaded terms, and business users bring their own vocabularies. An established corporate taxonomy is a must, as well as the basis for defining the schemata of your core business entities. If you can’t agree on the fundamentals, you can’t interoperate. As well, unique instance identifiers must be used globally to associate additional data from disparate sources. Once you’ve established an instance identifier and some relationships, additional data can be joined from key-value stores or services emulating them, with data maintained by different business units responsible for correlated data. Key-value stores can scale extraordinarily well, and lend themselves naturally to data partitioning.

If we need to exchange data, we MUST agree on a way to communicate “which one” of “which thing” unambiguously. Any text field which can be edited by a non-sysadmin DOES NOT work. If your identifier is “what you call it”, you’re doing it wrong, because “what you call it” can change, and your data must maintain its relationships despite a regime change in the naming department.

Envelope backs ARE important
Metrics are useless if they’re not sanity-checked.

If you’re doing what’s essentially key-value storage, if you know the average size of the objects you’ll store and multiply out by the number of expected objects, you’ll find that effective caching can keep most of your working set in memory for data that are mostly read. Even if you’re using an Oracle RDBMS for storage, database ninjas can do wonders with gobs of RAM. A Dell server with 8 cores and 256GB of RAM starts at about $21k before other hardware and software options, but that pales in comparison with the software license and support. Load up on RAM, and use it effectively.

If you’re storing user preferences for a million users, and the preferences objects will be at most 1k in size, then before accounting for overhead, you’re talking about a gigabyte of data. If you use that type of calculation as your starting point, it cuts through quite a layer of irrelevant crud. Sanity check against fundamentals. Allow for a large margin of safety if you’re required to spec anything out early, since you’ll always have been optimistic about overhead, or someone will have found another function that’ll require your resources, since they’re “available”.

Good, fast, and cheap ARE all possible
The economics of IT have fundamentally shifted in significant ways. Software and software support costs can make up the majority of the costs of maintaining a cheap, powerful server.

Open source alternatives are available for many of the commercial products you use. Commercially-supported “respins” of “Community Editions” can offer significant savings — and potentially better support for your use cases. Creative systems architecture folks can help you assess how you can meet the appropriate SLAs for your services.

To do this effectively, however, they’ll need that with which we started this discussion. Data.

Data about availability requirements. Data about access requirements. Data about access patterns. Data about consistency or transaction support requirements.

Recap for the impatient who scroll to the end to get to the point
Data’s got to be explicitly identifiable. IT activity data’s got to be stored and analyzed both in near-real-time as operational intelligence and for long-term trend analysis, “live-data replays” for accurate performance testing, and for resource and requirements planning. Proper data-driven dashboards at all levels indicate how things are working, as opposed to “no news is good news.”