Thursday, November 6, 2008

Book Review: Clean Code: A Handbook of Agile Software Craftsmanship

I recently noticed (and quickly read) another book from the prolific writer Robert C. Martin, "Uncle Bob" to people who know him in the industry. The book is titled Clean Code: A Handbook of Agile Software Craftsmanship. Although you see "Agile" in the title, it's not really a book about Agile Software Development, although it certainly describes practices that would occur while utilizing Agile techniques. It's really much more about software craftsmanship and refactoring. I'll tell you a little more about the book in this review. Note that all of the examples are Java, but very little content in the book references anything past "core Java", except for perhaps some references to collection classes and concurrency (although the basics of concurrency are not really Java-specific), so a C#-er would still get a lot of benefit from this book.

Note that although I really liked the theme and content of the book, I don't necessarily agree with everything he advises. You shouldn't either, but you should do so for the right reasons. You should never follow rules and guidelines if you don't know why you should do it, or what benefit it provides. If you respect that strategy while reading this book, you will appreciate it for what it is, a very careful and thoughtful analysis of how to write clean code.

What I'm reviewing is the online Safari version of the book, published August 1st, 2008.

Although RCM has the sole author credit, several chapters say right at the beginning "by ...", with names like Tim Ottinger, Michael Feathers, James Grenning, Jeff Langr, Dr. Kevin Dean Wampler, and Brett L. Schuchert, so I imagine this is more of a collaborative work, although it does not detract from the theme or style of the book.

The book has 17 chapters of content. I usually don't consider appendices part of the content of the book, but this book has a very good appendix titled "Concurrency II", which is sort of a "sequel" to the earlier "Concurrency" chapter in the book. I'll talk more about this appendix a little later.

The first chapter of the book is titled "Clean Code", which really sets the theme for the entire book (it's in the title, of course). Whether it's talking about good naming, formatting, coupling/cohesion, or testing, the object is to produce clean code, and to reap the benefits of such behavior. This chapter reviews the basic principles of what "clean code" means without getting into code yet.

The next chapter, "Meaningful Names" (by Tim Ottinger), covers one of the most important intangible skills required for effective software development, being capable enough in the English language (and typing, I suppose) to define names that effectively convey the true meaning and intent of a symbol, without requiring extensive comments to explain the meaning of the name (more about that anti-pattern later). This is a fundamental skill required for writing clean code.

Following this is the chapter called "Functions", which covers issues with function structure and high-level design goals, like "Do one thing" and "One level of abstraction" and issues that the function interface presents to clients of the function that can add or detract to the quality of the product.

The next chapter, titled "Comments", presents a theme that I've felt very strongly about for a long time, in that bad comments are much worse than no comments, and good understandable code with no comments at all is a good thing. The chapter explores various aspects of this.

After this is the chapter titled "Formatting", which emphasizes that formatting is very important, but the actual specification of how code is formatted is less important than the consistency of code formatting within a development group. Fortunately, modern tool and IDE support make it easier to set up and follow agreed-upon formatting guidelines. This chapter also talks a bit about average file length, calling this "vertical formatting".

The next chapter on "Objects and Data Structures" starts with points about proper interface abstraction. It then makes an interesting point about the difference between objects and data structures, in that the latter exposes data, but no functions, and the former exposes functions, but no data. In addition, it points out that between procedural code (using data structures) and object-oriented code, some changes are easier in procedural code, and others are easier in object-oriented code. There is also discussion of the Law of Demeter, which helps to reduce coupling between modules.

The "Exception Handling" chapter after this, by Michael Feathers, gives good advice on defining and using exceptions, and how to handle situations with "null" that can help reduce boilerplate special-case handlers.

The following chapter, "Boundaries" by James Grenning, refers to things beyond the boundaries of our code and application, mostly third-party packages that we integrate our code with. The most important point it makes is to insulate the rest of your code from fragility in that boundary by defining facades and interfaces that allow for changes past the boundaries to not adversely impact the rest of the application.

Next, the "Unit Tests" chapter gets into lots of details about how to write clean tests, but the underlying point (emphasized in the conclusion) is that the code in your tests is just as important as the code being tested, and that clean and understandable tests need to be there as part of the complete package.

After this is the "Classes" chapter, with (as opposed to "by") Jeff Langr, which brings up issues in class organization and structure that help to make the resulting classes easier to understand and more amenable to changes.

Next is the "Systems" chapter, by Dr. Kevin Dean Wampler. This chapter talks about some architecture issues above classes and functions. For instance, it talks about using dependency injection or lazy initialization to separate the setup of a system from its execution. Next, we get into separation of concerns and using proxies and aspect-orientation to implement cross-cutting concerns the right way.

Following this is the "Emergence" chapter, by Jeff Langr. This covers Kent Beck's four rules of "Simple Design", which the author believes contributes to the "emergence" of good design. When you read the information on the first principle, "Runs All the Tests", it really seems like it would be better named "Is Fully Testable". Whatever it's called, the section is clear on the importance of this. The next three principles, named "No Duplication", "Expressive", and "Minimal Classes and Methods" are introduced in the book by summarizing that all of these are achieved through the practice of careful refactoring, which may be an even more important lesson than the principles themselves.

The next chapter, titled "Concurrency", by Brett L. Schuchert, is one of the longest chapters in the book, and deservedly so. Code implementing concurrent algorithms can be extremely difficult to understand and maintain, if principles of clean coding are not observed. Even considering the length of this chapter, this is actually only the first of two chapters on concurrency. The second part is one of the appendices, as opposed to be being part of the regular content. Perhaps the authors felt it was getting too far out of scope, I don't know. The most important advice presented in this chapter is to use the "Single Responsibility Principle", particularly when you're considering writing code that implements both concurrency principles and business logic. This is where the "java.util.concurrent" package provides a lot of advantages, as it encapsulates the details of concurrency, allowing you to plug in POJOs that concern themselves only with business logic. This chapter also mentions an intriguing tool from IBM called "ConTest" that takes an aggressive approach with testing of concurrent code, by introspecting and instrumenting the code to accentuate concurrency vulnerabilities in the code. This is the first I've heard of this tool, but if and when I need to test concurrent code, I will definitely make use of it.

The next chapter, "Successive Refinement" is the first of three chapters where we get to see much of the advice that came before being put to use. These chapters present several "deep dives" into refactoring exercises with specific code samples (mostly taken from real code bases, some written by the author himself). The refactoring steps are very clear and detailed, although it sometimes even becomes hard to follow the detail. This would be a great situation to execute the same steps yourself with the original code base, not just to follow the details, but to help absorb the techniques into your common practices. As is normal in refactoring, some of the later steps were to rework code written in previous steps. Some of the examples mention the Clover code coverage tool, which the author uses to analyze the code coverage of his tests. The second of these three chapters even examines refactoring of JUnit tests, which emphasizes the fact that tests are just as important as the production code.

The last regular content chapter of the book, "Smells and Heuristics", is essentially a dictionary of numerous principles referenced throughout the book. It is divided into sections titled "Comments", "Environment", "Functions", "General", "Java", "Names", and "Tests" (they were ordered alphabetically in the book, also). Each principle in each section is numbered, and the references to these principles throughout the book are abbreviated like "N4", being the fourth principle in the Names section. Although each reference to these principles explained why the principle was used, citing examples where applicable, this dictionary also has its own examples in each principle, further supporting the advice.

As mentioned previously, Appendix A is titled "Concurrency II", by Brett L. Schuchert (again), and it really just explores more issues with concurrency. Curiously, comparing the content of the two chapters, this one explores some of the same concepts, but in more detail, and with more examples than the first chapter. This appendix is perhaps as long or longer than the first chapter on concurrency. This appendix also shows examples using generated bytecode, which helps to illustrate some of the issues a little better.

All in all, a concise (it is only 464 pages, after all) book on refactoring and principles to achieve clean code. It's definitely a book you should read if this is important to you. Just read it with an open mind and you will learn the things you didn't know, strengthen the principles you were already familiar with, and perhaps learn to appreciate more the principles you follow that conflict with the author's advice (few, most likely).