On Version Numbers -- Operation 500-pound Parakeet

The other day I got a question from a principal software architect: can they open a single Merge Request (Pull Request in GitHub terms) for multiple git repositories?

He drew two scenarios:

When they have a fundamental change in their gradle configs, they have to change in every repository
If there is an API change, they have to modify the code at multiple places at the same time.

This is a very large repo coming from ClearCase, and it compiles for several hours. It provides several executables, which should be run on different machines. A real legacy system.

They have already done heroic efforts to separate components, but interdependency is still very high.

Bad news

Obviously, it is not possible what they want. Or, to be more precise, what they think they want.

I think what they need, is proper versioning, and better interoperability. They won’t be able to save on the number of merge requests, for sure. However, leveraging good practices with a proper versioning system can turn the tables.

Semantic versioning

Versioning is easy: there are tree to five identifiers:

MAJOR version: increment when there are incompatible API changes
MINOR version: increment when adding functionality in a backwards-compatible manner
PATCH version: increment when fixing bugs in a backwards-compatible manner
Pre-release: optional, denoting a pre-release version
Build metadata: optional, add metadata version number

Example: 2.0.0-1.3.9+20130313144700: it can be a pre-release of 2.0.0, based on 1.3.9, with bundled metadata released on March 13, 2013, at 2:47 PM UTC.

It is well documented at http://semver.org.

However, by learning from Lean principles (this time, one specific principle: limit work in progress), instead of jumping into implementing incompatible changes, take a less invasive method.

API versioning and deprecations

This makes versioning a bit more interesting: if there is a need for change in the API, one should add a new one with extended functionality, and should mark the old one deprecated:

## send announcement to users
def send_colored_announcement_to_users(users, message, color)
  users.each do |user|
    user.send_announcement(message, color)
  end
end

## send announcement to users
### deprecated
def send_announcement_to_users(users, message)
  deprecated_call(‘send_color_announcement_to_users’)
  send_color_announcement_to_users(users, message, ‘black’)
end

There are a lot of options: add version numbers to the API call’s name, use a (slightly) different name, but one thing is common: when an old API endpoint is extended, deprecate it.

Then, it can be consumed at multiple levels: static code analysis (which can feed upon inline documentation’s deprecation notices), or logging deprecation warnings during testing.

Identifying and deleting dead code

Test runs can count the number of deprecation calls, to show whether we are ready for a version upgrade.

If we do this faithfully, a new major version is just the removal of deprecated code. If you don’t get any deprecation errors using the old version, it’s safe to say you can use the new major version.

I understand it’s a steam engine, but how does it work?

This all work nice in theory, but there are a lot of missing pieces you have to put in place:

Tracking outdated versions: if a dependency upgrades, how can we say it’s safe (let alone possible) to upgrade?
Enforcing upgrades: how can we get modules to upgrade?
Is there a way to use multiple versions in a single build output? If there are multiple version requirements for different parts of the same executable, odds are only one can be built in.
How to set up a development strategy for subcomponents? When there are multiple consumers, it’s often not enough to directly modify the code (let’s call it tactical changes), but it might require a higher goal, a strategy.
What to collect in deprecation measurement? With multiple modules, it’s often a good idea to collect module names, deprecated and suggested call names, and at least the version name, from when the deprecation took place, to be collected.

Then, if the compiler allows it, it can be a good idea to build a pre-version of the code automatically. This way we can predict whether dependent modules are ready for the upgrade.

Obviously, it doesn’t save us a lot of MRs being opened, but it makes each MR being independent of each other, and also uncovers a lot of yet invisible work of integration.