PIMPL, Stability and C++ Libraries

2023-04-13 · 1714 words · 9 minute read

general

development

So, day five! I have had a lot rattling around my head these past few years, and I have missed the more informal style of personal blog posts. Today I updated my system and Qt upgraded from 6.4 to 6.5 and the porting I have been working on in Avogadro was completely unaffected! This is one of the things I admire the most about the Qt project, for as long as I can remember (going back to the 3.x days for me) they have offered very good API and ABI stability in a C++ library where many say it is not possible. They also have what I consider to be very well thought out APIs that feel intuitive to use. In any major release I can compile with an older Qt version and it will work with any later version.

Gentoo Packaging

Some of my formative years as an open source contributor were spent packaging software in Gentoo Linux. Over the years I grew to really admire some libraries and hate others. You began to notice patterns as you try to keep a Linux distribution working where certain libraries you would update to the latest version and it “just worked” - Qt was one of them in general. Others you could update a bug fix release, or the eternal 0.x and if they were used widely everything that linked to them broke. That meant a rebuild of all dependent packages, and setting hard dependencies up between versions that made maintenance more difficult.

At the time I didn’t appreciate what was being done in the projects but I very much appreciated the outcome. Some minor releases not only required a recompile of everything that depended upon them but a patch to the project. Those I grew to dislike the most and when time was short they were some of the libraries that were not updated as frequently, or we developed library slotting strategies to offer multiple versions of a library. If you are familiar with these concepts I am referring to whether the Application Binary Interface (ABI) and the Application Programming Interface (API) stability respectively.

Binary Compatibility

The ABI is often referred to as binary compatibility and it represents the highest bar in stability for anything you might use. It means that the binary interface provided by the library remains stable/compatible. This does not actually preclude you from making changes as you might think, but you do have to be careful about what changes you make. The linker is pretty clever, but once your calling code has linked to a shared library it sets the symbol name, offsets and the size of the classes you are using. The KDE community has a great article (it took some searching, I had read previous iterations) with tips on what you can and cannot do if you want to preserve ABI stability.

Keeping a stable ABI is a huge commitment, and it can be very limiting for library developers to do so. It is much more difficult to do in general than source compatibility which is in and of itself quite difficult. If no-one is using your library, or you are tightly coupled with a lot of the downstream consumers it is likely not worth the effort, but as a library gains popularity/widespread use forcing the recompilation (or maintaining many versions of a library) will make the use of your library more difficult. If you are the only choice it may not matter, but if easier options emerge then I have witnessed migrations to more stable libraries.

PIMPL

The Qt community calls it a d-pointer, it is more widely referred to as pointer to implementation or an opaque pointer. In essence all you need to do is forward declare a class in your class, declare a pointer to that class as a protected member, and then implement/instantiate that class in your C++ implementation. I have not been doing this for a while, and had to remind myself of it but once you see an example it becomes clear.

class MyClass {
public:
  MyClass();
private:
  class PIMPL;
  PIMPL* d;
};

The main point here is that the pointer is now taking up the space a pointer takes up, initially you can just leave it at that (well maybe initialize it to nullptr to be safe) and reserve it for future use. It has some drawbacks too, which is why you don’t want to simply put everything in there. It is opaque to everything including inherited classes, there are patterns you can use to get around that too but it will increase code complexity. It adds an extra pointer dereference, and obviously gets allocated on the heap. That said it means you can add more member variables without affecting the ABI of your class. A simple implementation might look like:

class MyClass::PIMPL {
public:
  int precision = 4;
  float pi = 3.14;
};

You can then initialize the d pointer in your class constructor, be sure to delete it in the destructor or use a smart pointer that allows for construction in the implementation where the class is known. You then know you will carry around most of what your class declares in its header until the next major release, while the d-pointer is more flexible as it is just a pointer member variable as far as the binary interface is concerned. This is the core of how Qt and KDE make it easier to freeze their ABIs for so long.

You can add entire new classes, you can add more member functions that are non-virtual, export a class that wasn’t exported originally, change default arguments, etc. As you can see if you read the article there is a lot you cannot do! You can also use symbol visibility to really limit the size of your API surface to the API you wish to support, and there are middle grounds such as exporting some API that you outline as private for users willing to take a bumpier road while supporting the bulk of your more stable API.

Source Compatibility

This is much easier, but still more difficult than you might expect. The bar here is not to break code that was written to any previous version of your interface. You have to resist the urge to fix typos in function names, change the signatures entirely (although they can be tweaked, i.e. if you wish to add an extra argument but offer a default value). The goal here is to never require a code change to calling code in consumers of your library, but they will likely require a recompilation of their code if you are not offering ABI stability.

Versioning

Traditionally this is the line for a major release in what has become known as semantic versioning, which has been around for a lot longer than the term. One downside of recent semantic versioning in recent years is that projects make a lot of major releases. The Qt libraries were first released in 1995, with a 2.0 release in 1999 according to the Wikipedia article. Fast forward to April 2023 and they are at version 6.5, with major releases spanning many years I would assert that the level of API and ABI stability they offer is to be admired, especially considering this is over multiple operating systems too.

Don’t get me wrong, even with API and ABI stability there have been issues here and there as someone who has worked on Qt-based projects personally, professionally and in the open source world. They invest a great deal of effort into maintaining that stability, it has a cost of slowing progress at times, but on the whole the Qt API has evolved and improved with the C++ language, even deprecating its own APIs in favor of more standardized approaches at times.

Forever 0.x

I have been guilty of this as I think many library developers have. You want to get the API right and then offer stability over multiple years so you get to version 0.58 and realize maybe you just need to call it at some point! I think this is the perfectionist that doesn’t want to make a 1.0 and then immediately realize their fatal mistake, but you are avoiding drawing that line in the sand to offer any guarantee of stability.

Many Major Releases

The other side of this is the two year old project already on version 7.0! If you make a major release multiple times a year, and you are deprecating previous major releases rapidly it is tough for others to build on this. This sets off alarm bells for me when looking at libraries and it seems to be a common trait of recent projects using semantic versioning. You are encouraged to take the emotion out of it and just make releases when the API breaks.

Happy Medium?

I guess this is the age old question, and it really depends upon what you are doing. In my current role I manage a lot of production roll outs, and we value stability there too hence the purchase of RHEL where that major release will be around for a while! The downside is that it is boring and I can’t develop on RHEL, I personally prefer a rolling release such as Gentoo or more recently Arch where I get the latest Qt library from the package manager within hours or days of release.

I am not trying to claim I have the answer, but offer some perspectives from the different sides of this I have occupied throughout my time in software development. As a user of a library I want stability but I also want new features. As the developer of several libraries I want the freedom to innovate, evolve my API and add new features but also want people to use what I develop. I know one way to achieve that is with a stable API that is easy to build/reuse.

Final Thoughts

A discussion for another day is dependencies, reuse, providing a nice reusable SDK versus saying good luck with that! I am looking at how CMake has changed since I developed the Avogadro build system and what the best ways to modernize it would be. We are well past due for a stable release there…