That’s Not My Code!

I’ve just been reading with my 8 month old son. His favourite book at the moment is That’s not my Santa.

Thats Not My Santa

That's Not My Santa

The book is part of the “That’s not my” series by Rachel Wells. Each book follows the same pattern; 5 examples of the subject, in this case Santa, that are not “mine” for some reason.

That’s not my Santa. His sleigh is too sparkly.

And the final page shows an example of a Santa that is mine.

That’s my Santa! His beard is so fluffy.

Each page has a some textures on it to backup the statement being made about Santa; Fluff on Santa’s beard, shiny paper on his sleigh that kind of thing.  Pre-school kids just love these books. They’re made of thick card, so can withstand even the most determined of teeth-er!

One theme I’m sure Rachel hasn’t thought of is…code.  If I was author That’s not my code, what would I use as the examples? How about

  • That’s not my code. You can’t see the whole function on a single screen.
  • That’s not my code. It contains magic numbers.
  • That’s not my code. It doesn’t have any unit tests.
  • That’s not my code. It’s not been code reviewed.
  • That’s not my code. It’s got duplicated logic.

And finally:

  • That’s my code. It can be read by humans as well as computers.

So, come on, get the the Christmas spirit – If you were to author That’s not my code how would it read?

DSCM – A Move Back to Big Bang Integration

Used sensibly the move away from centralised to distributed source control management offers more benefits than it does drawbacks. Used unwisely it could take us back to the days of Big Bang Integration

Branching Privately

Irrespective of the number of meetings attended or PowerPoint slides produced, developers communicate in one language, and one language only – the language of code. In many organisations trunk, or centrally hosted branches, give visibility of the work going on. This keeps everyone in the loop as to what is being developed, how its being done, and by whom.

Moving away from this work flow, towards teams working in local branches away from prying eyes, cuts this information flow. The development effort in these branches, now lacking the exposure to more than the team working on it, could miss out on vital feedback from other team members. Team member, that had the work been done centrally, allowing them to see what was happening, could have given feedback. This could have saved the team weeks or months of effort had they known that code already existed over here, or this team requires that module to do that as well, so don’t duplicate effort. These issues won’t be discovered until the big bang, the merge back into the mainline.

Large Changesets

Branches, by there very nature result in larger changesets being pushed back into the mainline. This can also be true when working on the main branch, but on a local repository. There can also be a temptation to complete a full feature before pushing the changes to the mainline. Developers will still be commiting their code in small changesets, but without giving it that push upstream to the mainline many benefits are lost.

Committing a feature little by little has many benefits. I’ve already mentioned peer review when talking about branching. In addition to this it allows detection of regressions to be detected as early as possible, making them easier to locate. Putting code into the mainline in feature size chunks certainly result in making it easier to track which changeset caused a regression, as there will be less of them. But finding which part of that feature sized changeset caused the regression will not be so easy.

Enjoy responsibly

Distributed soruce control management systems give us great power, but remember with great power comes great responsilbity.

4 Steps to Painless SVN Branching

Doing some work that can’t be done in trunk in small increments? Then it time to branch.

Say your working on bug “12345 Wibble”…

Step 1: Create the branch

Create a branch of trunk, making a note in the commit message of the revision of trunk the branch is being created from:

svn copy -r 20000
svn://svn/trunk
svn://svn/branches/trunk_wibble/
-m "Bug 12345 Wibble -
Creating branch for feature work 
svn copy -r 20000 
svn://svn/trunk 
svn://svn/branches/trunk_wibble/"

There are now two types of commits you’ll be doing to this branch:

1 – Work for the feature
2 – Resyncing with trunk.

Step 2: Doing the feature work

The feature work should be tracked against a bug as normal:

svn commit -m "Bug 12345 Wibble
- Added Foo for Wibble"

Step 3: Merging changes from trunk into your branch

The commit messages for the merges from trunk are crucial for the book keeping of your branch. Above we branched at revision 20000, suppose trunk is now at revision 30000. So to get those changes merged over in your branch, sit in trunk_wibble, with no other local changes:

svn merge -r 20000:30000 svn://svn/trunk .

Once this is compiling commit it:

svn commit -m "Bug 12345 Wibble
- svn merge -r 20000:30000 svn://svn/trunk ."

Now next time you want to sync your branch with changes that have happned in trunk, all you need to do is look down the list of changesets to see what revision you last synced too.

Step 4: Merging back into trunk

Now your feature is done and dusted, its time to merge to trunk. You need to know 2 things here:

1 – What revision of your branch that was last synced to trunk, typically this will be the last commit you did you your branch
2 – What revision of trunk you last synced to, this will be in your commit message for that final commit.

Suppose these are 40000 and 5000.

Sit yourself in trunk at the revision you last synced to:

svn merge 
svn://svn/trunk/@50000 
svn://svn/branches/trunk_wibble/@40000 .

Depending on how many changes you’ve made this may take a while. Be sure to give the changes a check over to check that match what you think you’ve changed.

svn commit -m 
"Bug 12345 Wibble
Merging feature work into trunk
svn merge 
svn://svn/trunk/@50000 
svn://svn/branches/trunk_wibble/@40000 ."

And your done!

A final note about that last bit

I’ve seen a lot of confusion about this last step. What your not trying to do is merge each of your feature work changesets into trunk.

You’re just after a delta between trunk and your branch so you can apply that to trunk to make it the same as your branch, thats it.

So, what happened exactly when we did the merge back into trunk? That merge was really a diff between trunk and your branch, followed by the application of that patch. You could actually achieve a similar result by doing

svn diff
svn://svn/trunk/@50000 
svn://svn/branches/trunk_wibble/@40000 > wibble.patch

And then applying the patch manually.

patch -p0 < wibble.patch

However you’d end up with empty files for those that had been removed in your branch, and files that needed to be svn add’ed for those that had been added in your branch. Doing it the svn merge way does all the deleting and adding for you.

I Smell Duplication Can You?

Want to know how to find duplicate code in your code base quickly and easily? Want to be able to sniff out the most pungent of code smells in double quick time? You’ve come to the right place.

Smells

If your not familiar with the idea of code smells, the be sure to check out Martin Fowlers excellent book Refactoring: Improving the Design of Existing Code. Code smells are symptoms in your source code that can indicate problems – arguably the worst being code duplication which violates the principles of DRY.

The code maintenance issues are well known,  so I won’t revisit them here. What I want to do is talk about a tool I found the other day that helps find duplicate code.

Copy Paste Detection (CPD) is a great little program that can detect duplicate code in a code base.  Its available under a BSD-style licence, shipped as part of the PMD static code analyzer for Java. Although PMD is targeted as java, as CPD works using string matching, it can be used on any language.  Java, JSP, C, C++, Fortran and PHP are supported out of the box. It is also possible to add further langugages.

How to use it

Running CPD is very simple:

./cpd.sh ~/path/to/source/

And thats it. By default the output is in text format, this can be changed to xml or csv. Example output of processing the JDK (reported to take only 4 seconds) can be seen here. The number of duplicate tokens require for code to be considered copy and pasted can also be configured, this defaults to 100.

My findings

I had to increase the heap size available to java to get the code based I’m working on parsed. Its about a million lines of C/C++ code. There results were fascinating.  Sure enough, copy and pasted code was found, comments and all. Worse still, code that had been copy and pasted but not quite kept in sync, in most cases straight bugs.

The only real false positives I found were with auto generated code. By default CPD recursively parses the directories (you can supply as many as you like) on the command line, without being able to ignore certain files (eg *_autogen.cpp). As these files are produced as part of the build process, I’m now running CPD on a clean checkout, without build any build artifacts lying about.

What next?

As always with these things, I’m left with a bunch of open questions:

I can see this tool can offer some real value, but how do I integrate it into my teams work flow? Its a command line tool only, so there is no administration interface to allow results of various runs to be compared and analysed.

There are plenty of other static code analyzers that do much more than just check for duplication, what are peoples experiences with these?

Are You Stuck In The Maintaince Programmer Mindset?

We’re all very good at pointing out whats wrong with code. There are even websites dedicated to exposing and ridiculing bad code and bad design, such as the daily wtf. We all encounter code that isn’t “right” on a daily basis, but how often do we do something about it and let our actions speak louder than our words?

It’s very easy to point out whats wrong with code. Pointing out how the design and implementation of the code we’re working with is making our lives hard. Talking with fellow programmers about how our hands are tied and how we “did what we could” given that it “wasn’t our code”.

This all too familiar discussion demonstrates what I call the maintenance programmer mindset.

Are you too stuck in this mindset? Its very easy to fall into.

I have to remind myself regularly to break out of this mindset; to be bolder, to have the confidence to change the direction the code is heading. To make a clear statement about the problem domain by introducing a new class. To factor out that bit of repeated code into its own method. To break that monolithic module into two more focused modules to regain control over dependencies. To grab the wheel and make the kind of changes that steer the design down the correct path. The requirements on an active software project change constantly, so the design must also.

It’s quite easy to see if someone is stuck in this mindset. The following are a list of the actions that make a statement about the direction a code base is heading, in order of impact and are unlikely to be performed by sometime stuck in this mindset.

  • Adding new new module/package
  • Adding a new class
  • Adding a new source file
  • Adding a new function

If your not doing most of the above on a regular basis, irrespective of if your maintaining software or doing green field development – your programming with the maintainance programmers mindset.

Brain Teaser: Streaming An Enumeration In C++

Streaming an enumeratoin in C++, what could be easier? Can you spot the bug in the following code?

typedef enum {
    SEASON_UNDEF,
    SEASON_SUMMER,
    SEASON_AUTUMN,
    SEASON_WINTER,
    SEASON_SPRING,
    SEASON_NUM_TYPES,
} SEASON;
 
 
std::ostream& operator<<(std::ostream& rOs, 
                         const SEASON & rRhs)
{
    switch (rRhs)
    {
    case SEASON_UNDEF:
        rOs << "SEASON_UNDEF";
        break;
 
    case SEASON_SUMMER:
        rOs << "SEASON_SUMMER";
        break;
 
    case SEASON_AUTUMN:
        rOs << "SEASON_AUTUMN";
        break;
 
    case SEASON_WINTER:
        rOs << "SEASON_WINTER";
        break;
 
    case SEASON_SPRING:
        rOs << "SEASON_SPRING";
        break;
 
    default:
        rOs << "Unknown SEASON: " << rRhs;
        break;
    }
 
    return rOs;
}

Scroll down for the answer….

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Nearly there… 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

You’ve got it, its the default case. This makes a recusive call which will never terminate. Now how much stack space do I have….

rOs << "Unknown SEASON: " << rRhs; // Recusive call!

Overloading, Is It really Worth It?

For a long time I’ve used overloading, but just recently I’ve been questioning its uses.

Readability

Looking at the call site when invoking an overloaded function its not immediately obvious which method is being called.

find("code buddy");
find(C_PLUS_PLUS);

After a quick look through the overloading options available I can always work out which one will be called – but why should I? If code is making me think more than is absolutely neccessary then for my money, there is an issue with it. Wouldn’t the previous code be clearer if it was written like this:

findByName("code buddy");
findByLanguage(C_PLUS_PLUS);

Tags

My poor tags get confused! I’m working in a C++ code base on a linux OS. Rightly or wrongly, I’m using emacs with cscope as my chosen editing/tagging solution.  For example, if I’m working with the Visitor Pattern that is using overloading for its visit method, as search for:

void visit(const NodeA& node);

also finds me:

void visit(const NodeB& node);
void visit(const NodeC& node);

This maybe a shortcoming of cscope, however I’ve seen other tagging solutions, notably the one used in Slick Edit fall at the same hurdle.  I’d much rather see this visitor interface written like this:

virtual void visitNodeA(const NodeA& node) = 0;
virtual void visitNodeB(const NodeB& node) = 0;
virtual void visitNodeC(const NodeC& node) = 0;

Templates

It maybe nessary for templatised code that you use overloading, eg:

template<Node>
void doSomeStuffAndVisit(IVisitor& visitor,
                        const Node& node)
{
    // some code
    visitor.visit(node);
    // some more code
}

Constructors

Overloading of constuctors is unavoidable.

// Creates a Foo from the xml in file file
Foo::Foo(const std::string& file);
 
// Creates a Foo from the xml node root
Foo::Foo(const xmlNode& root);

Unless of course your using a factory method, in which case, there is no need to overload like this:

// Creates a Foo from the xml in file file
*Foo FooFactory::createFoo(const std::string& file);
 
// Creates a Foo from the xml node root
*Foo FooFactory::createFoo(const xmlNode& root);

The comments really are the give away here, taking the comments away things become so much more readable.

FooFactory::createFooFromFile(const std::string& file);
 
FooFactory::createFooFromXMLNode(const xmlNode& root);

Conclusion

These days, unless I’m working with any code thats using templates, i’m avoiding overloading, the pros are much outweighed by the cons.

4 Things To Consider When Fixing a Bug

So, you’ve just spent a few hours, days, or maybe weeks(!) tracking that down that bug. You’ve finally found it – what do you do? The fix is trivial, its a one line change, you can fix it right now, have it committed to source control, picked up in the next software release and move on to reading slashdot.

But before you go ahead and change that line of code, there are a few questions you should be asking yourself:

#1 How hard was it to track this bug down?

If you’ve been crawling over broken glass for a couple of days the chances are you’ve been adding some extra diagnostics to the system – these should be committed as part of the bug fix. Do not underestimate the importance of an easily diagnosable system, it will save you lots of time in the long run, not just for bug fixing, but during general development as well.

#2 Can I prevent the bug from being reintroduced in the future?

Now you can reproduce the bug (you have reproduced it before your claiming its fixed right?) – you need to write some form of test, why?

  • The tests will first expose the bug, then prove the fix.
  • The test will act as an insurance policy on your fix, as long as no one removes the test, the bug will stay fixed.

Ideally you’ll expose this via a unit test. This is the quickest form of feedback a developer can be given about the state of the system and will allow you to develop the fix with a very short feedback loop. If this is not possible,  then a regression test is your next best bet.

#3 Have I had it code reviewed?

A code review need only take a few minutes of another developers time, and will double the number of people that inspect the code before you commit, not a bad return really. If the code review takes longer, then the changes must be significant, or the problem domain suitable complex that running the change by another devloper really is a no brainer.

#4 Do I have a bug number to commit the change against?

Be sure to reference the bug number when you commit your fix. You don’t need to detail the symptoms of the bug in the commit, that is what a bug tracking system is for. Not only will this prove useful to other devlopers when they see your changes go in, it will also allow those that raised the bug see that work is happening on it.

Conclusion

These are the questions I go through in my mind before committing a fix – let me know yours!