Monthly Archives: October 2001

What do you know?

Why is it that most software developers have such a miserable track record when it comes to finishing projects on time and on budget? During my software career I met some of the smartest people around and still most of them were not able to finish projects by the date that they set.

I have a hunch about this problem. One of the reasons why developers can’t seem to make the date is because they never bother to separate the things they know how to build from the things that they don’t know how to build. I believe that if one starts paying attention to this little [say enormous] detail then the ratio of successfully finished projects can be dramatically improved.

Read on, and see if you agree…

I suggest that you group the components on a project into three major groups:

  1. Components that you know how to do.
  2. Components that nobody knows how to do.
  3. Components that you don’t know how to do, but somebody has done something similar before and you can find out how to do it.

Let’s look at all three in more detail to see why should you care about these three classes of components.

1. Components that you know how to do. This is easy. You have done it before and you can do it again. You might even do it better than you did it the last time. It should give you no trouble at all. If you collected any data when you did this the last time, then this time you should be able to estimate the job pretty accurately. You might even be able to reuse one of the versions you already have.

2. Components that nobody knows how to do. These are the patentable ideas. There just aren’t quite that many patentable ideas on any given project. What I mean, is that for most software applications, the part that is patentable is at best around 10% or less of the overall project. This truly innovative approach is part of the user interface, or an algorithm that makes your program stand out in the marketplace.

This part constitutes the highest risk on your project. If you can’t come up with the innovation or invention then the whole project might not be worth doing. I suggest that you focus on this part early, and create a backup plan for just in case that great idea never comes. Figure out if there is any way that you can still solve that hard problem without the patentable idea.

3. Components that you don’t know how to do, but somebody has done something similar and you can find out how to do it. It is easy to confuse #2 and #3: if you don’t want to take the time to find out who has done something like this before then you will go ahead assuming that it has not been done. Alternatively, you can easily feel it must’ve not been done satisfactorily and you will certainly do a better job then those coming before you.

Surprisingly, I think that it is with #3. that most projects get into trouble. Most developers don’t want to reuse stuff that other’s wrote. Occasionally, they don’t want to reuse what they wrote either. When I asked people why is this?, an interesting response surfaced: lack of trust. Most developers don’t trust other people’s code. Some don’t trust their own either, that’s why they want to rewrite it time and time again.

How do you decide if a component belongs to #1, #2, or #3? It does take some experience to put a component into the right bucket. The most important thing you can do is to ask the question: Which bucket should I put this component in? If you see that too many of them go into #2, I would encourage you to recheck them again. It might be the case that the components you have there are still too big and you need to break them down further to properly place them. Then you might discover that they actually fit into #1.

What do you think?

Common Defects

A while back I had the opportunity to analyze a few defects up close. I saw some interesting types and I decided to write about them. Understanding the nature of your defects could prove to be useful for your day-to-day work.Most of the program defects that I have seen fall into one of the following categories:

  1. Asking a component to do too much. Closely related: having only a few really big components in a large program.
  2. Misunderstanding the responsibilities of a component, or never deciding on clear responsibilities in the first place.
  3. Not knowing the cost or nature of operations, language features, library, API calls and thus negatively impacting program performance.
  4. Misunderstanding what is global data and what is local data and as such synchronizing too much or not enough.

Let’s examine each of them in more detail.

1. Asking a Component to do Too Much

It probably happened to you before: you look at the list of methods from a component and you can’t tell what is this component responsible for. As you read through the list, you get the impression that this component is universally usable for about a half or a third of what the program does. Unless this component is a facade for a subsystem, you are in trouble.

You also notice that it is increasingly difficult to debug the system. It feels like all roads lead to Rome, or translated to software: no matter what call you are tracing, you have to pass through the same giant components. After a while you don’t understand where the calls go and you find yourself just trying things out.

To make sure that we are on the same page, here are the definitions for a component and for a class: A component is a concrete specification of an abstraction (either from the problem domain or from a supporting domain, commonly known as API or library). A class is a concrete specification of an abstract data type (ADT). How do components and classes relate? A component is made of one or more classes. The key here is that a component usually has a real world equivalent, while the class is usually a “fabrication.”

This is a good time to stop, reorganize, maybe redesign, and refactor. Figure out how can you make your component more cohesive again. Pare down the responsibilities to a set that makes sense and is cohesive. Maybe create a few new components. As you are reworking the components, watch out for coupling between them. It makes sense to draw up a dependency graph and to get rid of loops from it.

Maybe the way you notice this defect is that you are not sure where to put the new functionality that you have to add to the program, and you start putting things in components pretty much randomly. This leads into the next problem:

2. Misunderstanding the Responsibilities of a Component

There is a quick test to see if the responsibilities of a component are clear: read them and then see if you can retell them. If you remember what a component does, then it is easier to add a new responsibility that is a good fit with the existing responsibilities. If you don’t remember what are the responsibilities of a given component, then you are almost guaranteed several program defects both by you and by others who use the component you have created. The same applies to method names, requirements, etc. This test has not failed me yet.

If you organize the program, as much of it as you can, into components that represent aspects of the customer’s business, then your program structure will be more resilient to changes then if you organize it along features. The customer’s business fundamentals tend to not change nearly as fast as their wishes about your program. And when the business changes, your changes might be localized to the business components.

3. Not Knowing the Cost or Nature of Operations, Language Features, Library, API Calls and thus Negatively Impacting Performance

In the The Design and Evolution of C++ Bjarne Stroustrup describes one of the Low-Level Programming Support Rules for C++ as: what you don’t use, you don’t pay for (zero-overhead rule) (121). The interesting part of this statement is not explicit: you must know the cost of each language feature or you won’t be able to decide what you are willing to pay for.

It is interesting to note that Stroustrup designed C++ to suit his needs and the needs of its users. Chapter 4 contains the C++ Language Design Rules. He had a problem with “how to analyze the network traffic [in telephone switching equipment]” and he decided that in order to solve this rather complicated problem he needed better tools. So he created the Cpre, which later became C++.

Amongst you the Java programmers may wonder how this applies to you. Quite well. Even when you program in Java you need to know what is happening behind the scenes if you are to write production quality software. Otherwise, your program will be plagued with performance problems.

Performance problems are an ugly beast because they don’t show up when you test your program with a few users or limited data. Everything seems to work fine and takes a limited amount of time. But then you turn on the firehose and all things come to a halt. The display routine doesn’t work anymore because handling 2,000 items is not the same as handling 10. The database is slow, because the query that worked on 100-record table doesn’t work well on 10,000. I know that some of you are fanatics about this stuff and have solved this problem well. Drop me a line and share your experience with others.

OK, you say: What is the answer? Well, it depends. I think that knowing that this problem exists is half the battle. The other half is devising the right solution. Tell me about your specific problem and I will help you come up with a solution.

4. Misunderstanding What is Global Data and What is Local Data and Synchronizing Too Much or Not Enough

One of the basic rules of concurrent programming that I learned back in college that still holds true in the age of Java is that you must have protected access to all global data. This is a simple and easy to remember rule that can keep you out of a lot of trouble. Now all you have to do is keep thread local data and global data separated (don’t forget about those parameters that get passed back and forth between threads) and you will have yourself a well-behaved program.

This is not nearly as hard as it sounds. If you “limit” yourself to only doing things this way, you might not be writing some potentially exciting component that solves a problem in a completely unique way that nobody else can get without two PhDs. But, hey, the program you write will work. Period.

Here are a few simple steps to help you figure out what global data is and how to deal with it:

  1. Count the number of separate threads in the program.
  2. Count the number of threads that can run at the same time.
  3. Identify all pieces of data that are accessed by more than one thread identified in 2. Include parameters that get passed between threads in this list. You must protect access to this data.
  4. Protect access to data with locks. Use one or more locks, as necessary. Don’t forget: Always attempt to lock in the same order and unlock in the reverse of the locking order.

Again, this is a minimal solution, but it works. You can solve most problems this way. If you think that there is anything wrong with it, drop me a line. For those of you programming in Java, a good book on the subject is Concurrent Programming in Java, 2nd. Ed., Design Principles and Patterns, by Doug Lea.

I know that both of these subjects deserve way more detail than what I have time for in this short note, but I hope that now that you are aware of these problems, you will take steps to prevent them in your next project.

How Can You Start?

A few simple rules can get you going and can help you create better components:

  • Component interfaces should be minimal yet usable.
  • Class interfaces should be primitive.
  • Public interfaces should be complete.
  • Private interfaces should be sufficient.

Ask the question: What component would I need to have to get the job done? If I would have a component that can do x, then could I implement what the customer is asking for? and so on. Group the requested features into components. Do the memory test with them. Ask a friend to also do the memory test. In no time you’ll be writing great components.

For more on this refer to the Lakos book, Large Scale C++ Program Design. And don’t worry about the title, most of the stuff applies nicely to Java as well. Another good source for advice and thorough theoretical explanation is Object-Oriented Software Construction, 2nd Ed. by Bertrand Meyer. And of course, you can always crack open Code Complete by Steve McConnell.

Let me how your components are coming along.