CodeBork | Tales from the Codeface

The coding blog of Alastair Smith, a software developer based in Cambridge, UK. Interested in DevOps, Azure, Kubernetes, .NET Core, and VueJS.

Project maintained by Hosted on GitHub Pages — Theme by mattgraham

This post represents the second instalment taken from chapter 6 of Code Complete, entitled “Working Classes”. This post covers the issues to consider when designing class interfaces, illustrated with code samples. An important piece of information to keep in mind when reading this post is that McConnell is talking in terms of the public interface exposed by a class through its public members. While an interface (as defined in Java or C#) also fits this bill, some of the advice given here is specific to the idea of a class interface, and not a standalone interface.

If you’re after the executive summary (this is quite a long post, after all), there are only two things you must build into your class interfaces: good abstraction and good encapsulation. Read on to find out more.

Good Abstraction

A good abstraction is one that is self-consistent, programmatic and clearly-defined. As such, a class’s interface should offer a group of routines that clearly belong together. For example, an Employee class should describe an employee’s personal details and provide services to initialise and use an Employee object. Listing 1 below provides an example of a class interface with good abstraction, while Listing 2 provides an example poor abstraction.

public class Employee { public string Name { get; set; } public string Address { get; set; } public string HomePhoneNumber { get; set; } public string WorkPhoneNumber { get; set; } public TaxId TaxId { get; set; } public JobClassification JobClassification { get; set; } public Employee() { } public Employee(string name, string address, string homePhone, string workPhone, TaxId taxId, JobClassification jobClass) { Name = name; Address = address; HomePhoneNumber = homePhone; WorkPhoneNumber = workPhone; TaxId = taxId; JobClassification = jobClass; } }

Listing 1: A Class Interface with Good Abstraction

Internally of course the class may have additional routines, data, etc. to support the publicly-available methods and data, but the user of the class doesn’t need to care or even know about these. Compare this with the following class interface providing a poor abstraction; there is so much wrong with this, McConnell awards it a Coding Horror badge:

public class Program { public void InitialiseCommandStack(); public void PushCommand(Command command); public Command PopCommand(); public void InitialiseReportFormatting(); public void FormatReport(Report report); // ... etc. }

Listing 2: A Class Interface with Bad Abstraction

To improve Listing 2, the routines should be refactored into separate classes, and the Program class should have a consistent abstraction with high cohesion, such as in Listing 3:

public class Program { public void InitialiseUserInterface(); public void ShutdownUserInterface(); public void InitialiseReports(); public void ShutdownReports(); // ... etc. }

Listing 3: A Class Interface with Better Abstraction

An extension to this idea is to use the Single Responsibility Principle (SRP), which ensures each class has internal cohesion in addition to good interface cohesion.

The class interface should present a consistent level of abstraction. For example, each class should implement one and only one Abstract Data Type. Mixing levels of abstraction in a class, such as providing methods to read a file and process the data read in, is poor design: it reduces cohesion, weakens the class’s own abstraction, increases coupling, and results in a maintenance nightmare.

When coding your class, be sure you understand what abstraction the class is implementing. McConnell relays the anecdote of a project he once worked on that ended up wrapping a spreadsheet control rather than a simpler grid control, because the spreadsheet abstraction was closer to what the control needed to do. The wrapper class exposed all n-hundred methods of spreadsheet, rather than simplifying to the needed grid plus n methods required to implement the same functionality “super-grid” functionality. There was uproar at implementation time, but it proved to be the correct decision when it came to maintenance: the spreadsheet abstraction was a stronger one for the control, which meant the implementation was simpler to grasp than the equivalent grid implementation would have been.

Ensure that your class’s services are provided in pairs with their opposites. As an example, if your class has an operation adding an item to a list, it will likely need one removing an item from the list as well. Don’t create these pairs willy-nilly, but do always check to see if you need the complementary operation when you create one.

Don’t be afraid to move unrelated information to another class. (Again, this reflects on the SRP.) Occasionally you will come across a situation where half the class’s routines work with one half of the data, whilst the other half of the routines work with the other half of the data. In situations like this, you should split the two halves into separate classes that have their own cohesive and consistent abstractions.

Interfaces should be programmatic rather than semantic. McConnell describes how an interface definition conceptually consists of two parts. The programmatic part consists of the data types and other attributes that can be enforced by the compiler, whilst the semantic part is made from the assumptions of how the interface will be used (such as “MethodA() must be called before MethodB()”). This semantic part should be documented in the class/file comments, but it is important to keep interfaces only minimally dependent on documentation: often documentation goes unread, and comments tend to fall out of sync with the code they document. A nifty trick is to use asserts or other similar techniques to make the semantic elements of the interface programmatic.

Speaking of changing code, you should beware of erosion of the interface’s abstraction under modification. Listing 4 provides an example of a class that has been modified without giving thought to the interface abstraction:

public class Employee { public FullName FullName { get; set; } public Address Address { get; set; } public PhoneNumber WorkPhoneNumber { get; set; } // ... public bool IsJobClassificationValid(JobClassification jobClass) { // ... } public bool IsPostCodeValid(Address address) { // ... } public bool IsPhoneNumberValid(PhoneNumber phoneNumber) { // ... } public SQLQuery CreateNewEmployeeQuery { get; } public SQLQuery ModifyEmployeeQuery { get; } public SQLQuery RetrieveEmployeeQuery { get; } }

Listing 4: Example of a Class Interface that’s Eroding Under Maintenance

In the real world, there is no logical connection between employees and routines that check valid post codes, etc. — unless of course you’re employing people to manually validate post codes! I think we can agree that’s sufficiently unlikely. Similarly, the database interaction methods do not belong here, because they are at a much lower level of abstraction than the employee class itself. Utilising an ORM library like NHibernate^* allows you to create an Employee from the database, abstracting away the mechanics of talking to the database.

Don’t add public members that are inconsistent with the interface abstraction! Always ask yourself whether the member you are adding is consistent with the class’s abstraction. If it’s not, find a better place to put it, creating a new class if necessary.

It is important to consider abstraction and cohesion together. These ideas are closely related: a class interface that presents a good abstraction usually has strong cohesion, although the inverse doesn’t hold as strongly. If you see a class that has weak cohesion and you can’t work out how to correct it, ask yourself whether the class presents a good abstraction.

Good Encapsulation

As we saw in a previous post, encapsulation is a stronger concept than abstraction. Whilst abstraction helps manage complexity, encapsulation enforces the abstraction by preventing you from looking at the details. McConnell takes no prisoners in pairing these together: either you have both abstraction and encapsulation or you have neither. There is no middle ground.

So, how do we practice good encapsulation? First off, we must minimise the accessibility of classes and their members. Good object-oriented languages provide differing levels of accessibility, including public, protected and private. .NET also provides two further access modifiers: internal (accessible by other members of the same assembly), and protected internal (accessible by other members of the same assembly, or derived classes in another assembly). You should know the definitions of each of these accessibility levels inside out, so that you can fully grasp the implications of using each.

One school of thought in utilising access modifiers is to lock down the class or member to the lowest workable level; however, this is not necessary if exposure is consistent with the abstraction. It’s worth keeping in mind that hiding more is generally better than hiding less. Certainly you should never expose member data in public: this violates encapsulation in perhaps the most heinous way.

McConnell also warns against putting private implementation details into a class’s interface. In modern languages like C# and Java, this isn’t possible, and may in fact be specific only to languages where private implementation details can be exposed via the class’s header file (e.g., C++ and Objective-C).

When designing and implementing your classes, you should avoid making assumptions about the class’s users. Instead you should design and code to adhere to the contract specified by the class’s interface. You should also avoid friend classes. These are classes that know about the internals of their friends. Again, they violate encapsulation, and expand the amount of code you have to think about at any one time, increasing complexity. They can very infrequently be used to manage complexity, such as in the State pattern.

Don’t put a routine into the public interface just because it uses only public routines. The fact that only public routines are used by a routine is irrelevant to whether it should be exposed in the interface. If it’s not consistent with the abstraction, don’t expose it.

You should also favour read-time convenience to write-time convenience. This was also touched on in The Pragmatic Programmer. Source code is read many, many, many more times than it is written; favouring write-time convenience is a false economy.

Be wary of semantic violations of encapsulation. Semantic violations can be quite difficult to spot, so here’s a run-down of some examples:

Not calling ClassA's Initialise() method because you know that ClassA.PerformFirstOperation() calls it automatically
Not calling database.Connect() before calling employee.Retrieve(database) because you know Retrieve() will automatically connect to the database if there isn't already a connection
Not calling ClassA.Terminate() because you know ClassA.PerformFinalOperation() calls it automatically
Using a reference to ObjectB created by ObjectA even after ObjectA has gone out of scope because you know ObjectA keeps ObjectB in static storage and ObjectB will still be valid. This to me sounds like a great route to creating memory leaks.
Using ClassB.MAXIMUM_ELEMENTS constant instead of using ClassA.MAXIMUM_ELEMENTS constant because you know they're both set to the same value.

The problem with semantic violations of encapsulation is that they make the client code depend on the private implementations, not the public interface. What happens if, in the last example above, the value of ClassB.MAXIMUM_ELEMENTS changes? At best, you will notice an obvious bug, but more likely some subtle behaviour will have been introduced into the application that will be hard to reproduce.

Always watch for coupling that is too tight. Ensure you minimise the accessibility of your classes and their members, and avoid friend classes because they’re tightly coupled (by definition). Make data private rather than protected in a base class so that derived classes are less tightly coupled to the base class.

Finally, be sure to observe the Law of Demeter, also referred to as the Principle of Least Knowledge. This is succinctly defined as

Each unit should have only limited knowledge about other units: only units "closely" related to the current unit.

Any given class should make as few assumptions as possible in its communications with other entities. Formally, for any method M on object O, M may reference:

O
M's parameters
anything created within M
any direct component of O
global variables accessible by O in the scope of M

The result of these restrictions is that your code should only use one dot. For example a.Method() obeys the rule, but a.nother.Method() violates it.

^* Other ORM libraries are available.