Common Defects

A while back I had the opportunity to analyze a few defects up close. I saw some interesting types and I decided to write about them. Understanding the nature of your defects could prove to be useful for your day-to-day work. Most of the program defects that I have seen fall into one of the following categories:

Asking a component to do too much. Closely related: having only a few really big components in a large program.
Misunderstanding the responsibilities of a component, or never deciding on clear responsibilities in the first place.
Not knowing the cost or nature of operations, language features, library, API calls and thus negatively impacting program performance.
Misunderstanding what is global data and what is local data and as such synchronizing too much or not enough.

Let's examine each of them in more detail.

1. Asking a Component to do Too Much

It probably happened to you before: you look at the list of methods from a component and you can't tell what is this component responsible for. As you read through the list, you get the impression that this component is universally usable for about a half or a third of what the program does. Unless this component is a facade for a subsystem, you are in trouble.

You also notice that it is increasingly difficult to debug the system. It feels like all roads lead to Rome, or translated to software: no matter what call you are tracing, you have to pass through the same giant components. After a while you don't understand where the calls go and you find yourself just trying things out.

To make sure that we are on the same page, here are the definitions for a component and for a class: A component is a concrete specification of an abstraction (either from the problem domain or from a supporting domain, commonly known as API or library). A class is a concrete specification of an abstract data type (ADT). How do components and classes relate? A component is made of one or more classes. The key here is that a component usually has a real world equivalent, while the class is usually a "fabrication."

This is a good time to stop, reorganize, maybe redesign, and refactor. Figure out how can you make your component more cohesive again. Pare down the responsibilities to a set that makes sense and is cohesive. Maybe create a few new components. As you are reworking the components, watch out for coupling between them. It makes sense to draw up a dependency graph and to get rid of loops from it.

Maybe the way you notice this defect is that you are not sure where to put the new functionality that you have to add to the program, and you start putting things in components pretty much randomly. This leads into the next problem:

2. Misunderstanding the Responsibilities of a Component

There is a quick test to see if the responsibilities of a component are clear: read them and then see if you can retell them. If you remember what a component does, then it is easier to add a new responsibility that is a good fit with the existing responsibilities. If you don't remember what are the responsibilities of a given component, then you are almost guaranteed several program defects both by you and by others who use the component you have created. The same applies to method names, requirements, etc. This test has not failed me yet.

If you organize the program, as much of it as you can, into components that represent aspects of the customer's business, then your program structure will be more resilient to changes then if you organize it along features. The customer's business fundamentals tend to not change nearly as fast as their wishes about your program. And when the business changes, your changes might be localized to the business components.

3. Not Knowing the Cost or Nature of Operations, Language Features, Library, API Calls and thus Negatively Impacting Performance

In the The Design and Evolution of C++ Bjarne Stroustrup describes one of the Low-Level Programming Support Rules for C++ as: what you don't use, you don't pay for (zero-overhead rule) (121). The interesting part of this statement is not explicit: you must know the cost of each language feature or you won't be able to decide what you are willing to pay for.

It is interesting to note that Stroustrup designed C++ to suit his needs and the needs of its users. Chapter 4 contains the C++ Language Design Rules. He had a problem with "how to analyze the network traffic [in telephone switching equipment]" and he decided that in order to solve this rather complicated problem he needed better tools. So he created the Cpre, which later became C++.

Amongst you the Java programmers may wonder how this applies to you. Quite well. Even when you program in Java you need to know what is happening behind the scenes if you are to write production quality software. Otherwise, your program will be plagued with performance problems.

Performance problems are an ugly beast because they don't show up when you test your program with a few users or limited data. Everything seems to work fine and takes a limited amount of time. But then you turn on the firehose and all things come to a halt. The display routine doesn't work anymore because handling 2,000 items is not the same as handling 10. The database is slow, because the query that worked on 100-record table doesn't work well on 10,000. I know that some of you are fanatics about this stuff and have solved this problem well. Drop me a line and share your experience with others.

OK, you say: What is the answer? Well, it depends. I think that knowing that this problem exists is half the battle. The other half is devising the right solution. Tell me about your specific problem and I will help you come up with a solution.

4. Misunderstanding What is Global Data and What is Local Data and Synchronizing Too Much or Not Enough

One of the basic rules of concurrent programming that I learned back in college that still holds true in the age of Java is that you must have protected access to all global data. This is a simple and easy to remember rule that can keep you out of a lot of trouble. Now all you have to do is keep thread local data and global data separated (don't forget about those parameters that get passed back and forth between threads) and you will have yourself a well-behaved program.

This is not nearly as hard as it sounds. If you "limit" yourself to only doing things this way, you might not be writing some potentially exciting component that solves a problem in a completely unique way that nobody else can get without two PhDs. But, hey, the program you write will work. Period. Here are a few simple steps to help you figure out what global data is and how to deal with it:

Count the number of separate threads in the program.
Count the number of threads that can run at the same time.
Identify all pieces of data that are accessed by more than one thread identified in 2. Include parameters that get passed between threads in this list. You must protect access to this data.
Protect access to data with locks. Use one or more locks, as necessary. Don't forget: Always attempt to lock in the same order and unlock in the reverse of the locking order.

Again, this is a minimal solution, but it works. You can solve most problems this way. If you think that there is anything wrong with it, drop me a line. For those of you programming in Java, a good book on the subject is Concurrent Programming in Java, 2nd. Ed., Design Principles and Patterns, by Doug Lea.

I know that both of these subjects deserve way more detail than what I have time for in this short note, but I hope that now that you are aware of these problems, you will take steps to prevent them in your next project.

How Can You Start?

A few simple rules can get you going and can help you create better components:

Component interfaces should be minimal yet usable.
Class interfaces should be primitive.
Public interfaces should be complete.
Private interfaces should be sufficient.

Ask the question: What component would I need to have to get the job done? If I would have a component that can do x, then could I implement what the customer is asking for? and so on. Group the requested features into components. Do the memory test with them. Ask a friend to also do the memory test. In no time you'll be writing great components.

For more on this refer to the Lakos book, Large Scale C++ Program Design. And don't worry about the title, most of the stuff applies nicely to Java as well. Another good source for advice and thorough theoretical explanation is Object-Oriented Software Construction, 2nd Ed. by Bertrand Meyer. And of course, you can always crack open Code Complete by Steve McConnell.

Let me how your components are coming along.