What's wrong with configuration management?

Published: Fri, 09 Aug 2019 00:00:00 GMT by David Chan

Configuration management isn't a sexy topic. It's not AI. It's not Machine Learning. It's not the Cloud or Blockchain. But it's an essential part of what we do every day. When we write code, we always know that we will have to support multiple different functionalities, multiple "configurations." In machine learning, this can be different hyper-parameters/optimizations, input files, produced artifacts, etc.

Why Configuration Management Is Broken

For a long time I've thought that configuration management is broken. Current configuration tools suffer from a number of problems:

1. They're one dimensional (And pretty opinionated)

Most configuration tools focus on one mode of configuration, and allow you to configure in one way). You can use the command line OR you can use a database. You can access your variables using a namespace object, or you can automatically assign them to variables in a function scope.

A good configuration management tool should be able to handle many different kinds of configuration, from configuration files to command line, to even configuration from database tools, or HTML requests. Good configuration management allows you to load and combine configurations from different sources, defining priorities and defaults between these methods.

2. They don't leave a trace

I have written the following line of code 1000 times:

with open('config.json') as jf:
    json.dump(vars(args), jf)

Unfortunately, this means that I have to write paired loading code for the arguments later on, and I have to worry about all of the configuring myself. This isn't exactly an appealing use of my time.

3. They don't scale

When you start a project, you don't really think about scaling. You think about how quickly you can get an MVP up and running. Unfortunately, this means that a lot of the configuration tools that exist right now won't scale to larger projects, or scenarios that you never imagined would exist. Not only this, but they're inflexible when it comes to expanding it yourself. A good configuration management tool allows for expansion.

4. They require too much code to use

You shouldn't have to write hundreds of lines of code to get your code to respect your configurable variables. Configurations should be available where you need them, when you need them, without requiring a user to bend over backwards.

A good configuration management tool should be invisible in your code - if something is configurable, it shouldn't require hundreds of lines of code to understand and write. It shouldn't require you to do crazy backwards magic in order to get code to run in different configurable states. A good configuration tool has a "magical" interface which gives you what you need, when you need it.

A good example of an invisible configuration tool is Sacred which silently injects the value of configurable variables into your function scopes (If this sounds dangerous, you're right! it is! It's not a perfect tool, but it's just fun to use).

Designing the Perfect Configuration Tool

Over the next few weeks, I'm going to be working on developing the outline of a configuration management tool. Who knows, I might even build the tool. But this is going to be an exercise in discovering the world of configuration management - and building a new tool which I can use in my everyday work which is designed to be a powerful and flexible solution to our configuration woes.

From the above problems, I can already think of a set of overall design goals:

  • Be flexible
  • Seamlessly scale from small projects to large codebases
  • Leave footprints
  • Be "magical" to use, but safe enough to bet the farm on

There are some awesome tools out there like argparse, Sacred, dynaconf, click, docopt, gflags, PyInquirer, Clint, Cliff, Cement, Plac,  and many more.