Detecting Duplicate Code

I’ve recently stumbled across a tool that detects duplicate code (aka code clones) called CCFinder.  It’s definitely not the simplest thing in the world to download and install, and it’s a little bit painful to exclude generated files (it has to be done manually) but it does do make finding repeated code a lot easier than with a tool like Simian.  Why?  Because you get a visual indication of where the duplicates are and can see the code itself.  Plus you have greater control over the rules used to detect duplication of code.

Now before I run through things, it should be noted that I’d expect to see some level of "detected duplication" in any non-trivial code base.  There are often scenarios where one part of the code will have a similar structure to other parts even though they're performing different functions or you'll have generated code or have situations where avoiding duplication adds more complexity than it is worth.  Even so, the amount of duplication detected in any code base should be kept as small as possible as it keeps your code maintainable and means that fixes and changes only need to be made once.  There's nothing worse than fixing a bug in one part of the code, not realising the same code actually exists elsewhere and also needs fixing.

To give you an idea of what CCFinder shows have a look at the following screen shots:

This first picture is showing code clone metrics per file and a visual indication of where the code clones are.

In the left panel (the file listing) various metrics are shown per file:

  • LEN: File length (in tokens – variable names, method calls, etc)
  • CLN: Number of Code Clones
  • NBR: Neighbors – Number of other files that share a code clone in this file
  • RSA: Ratio of Similarity to another file.  Lower is better.
  • RSI: Ratio of Similarity within the file. Lower is better.
  • CVR: Coverage – Percentage of tokens covered by another code clone (and indication of how much of the code is duplicated)

In the right hand panel we have a visual indicator of where the clones are.  The long diagonal line can be thought of as a mirror line and the black marks on each side of that diagonal are the clones.  The large boxes are directory boundaries, so we can see which directories have more duplication than others.

ccfinder2

We can also use Source view to see the duplicates between files.   For example this shows a SetDateLabel() method in two different files where the code only differs by the parameter being called.  It would be a great refactoring candidate.

If it's not obvious the section between the file listing and the code windows is a visual indicator of which file sections the two source windows are showing, and where the duplicated code is within those files.

ccfinder5

You can also see code clones within the same file as well:

ccfinder4

I've only used it for a short while, but I'm finding it to be very, very useful.  If you can work through the crappy web site and the awful download/registration process hopefully you'll find it just as useful.

If you want the software, it’s free but you will need to register to get a license.  Go to http://www.ccfinder.net/index.html and download it from there.

 

Reader Comments

Any idea on the relationship between Minimum Clone Length and Minimum TKS when configuring clone detection options. These two together determine what is a clone, but I don't have a clue what is the relation.

Hi ,

I was trying to run CCFinder but it seems I am getting a licencekey invalid error.

I have:
1)CCFinderX ver. 10.2.5 for WinXP
2) java version "1.5.0_07"
3)python 2.5
4)licensedata.eml file in bin directory.

Is there any idea why it is still givin the error?

Actually it is free for non-commercial and/or educational purposes.

Please refer to the license agreement:
http://www.ccfinder.net/ccfinderx.html

Post a Comment



Welcome

As you might have guessed, this is a tech blog. I normally write about development and things that can help us do it better, specifically in the .NET space, and occasionally i'll throw in a post about whatever happens to be in my head at the time. Enjoy your visit!
And in case you were wondering, I'm a Principal Consultant for Readify, I run the Sydney Alt.Net group and I play a bit of LOTRO in my spare time (feel free to look me up)


 

My Open Source Projects

Labels

Blog Archive