Code duplication is against the software engineering best practice of code reusability.  Some of the major disadvantages of code duplication are the following

  1. It Increases the number of Lines of Code (LOC), which impacts the performance of the software.
  2. Need to write extra unit tests to cover each duplicate method to maintain a good coverage.
  3. Needs to make changes in multiple files for a change due to code duplication. This will impact the maintenance  cost
  4. Highlights the lack of quality of the software team.

Different types of Duplication

The code duplication can be broadly classified in to three types
  • Different Methods but with Identical LOC
             Example:  methods – m1 () and m2() which contain the same identical LOC.
  • Same Method with Identical LOC but in different class files
             Example:  method m1 () in class A and class B which has the same LOC
  • Identical set of LOC in multiple methods
             Example:  Two methods – m1() and m2() in same or in different files ,which contains 30 LOC but 15 LOC are identical
  • Similar LOC
              Example: Two methods m1() and m2() which has “almost” similar LOC but parameter and attribute names are different.

This article addresses the first three category of code duplication.

 

Duplication Elimination Procedure

Here are some of the common methods through which we can remove these duplication.

 

Different Methods but with Identical LOC & same Method with Identical LOC but in different class files

Example:  methods – m1 () and m2() which contain the same identical LOC.

Method-1 : Delete and Redirect approach

  • Maintain one method (e.g.: m1 () ) which will be used throughout the software.
  • Delete the contents of other methods (e.g.: m2() and replace it with actual method call – m1()

[code]
public void m1() {
// LOC
}

public void m2() {
// removing its content and replace with m1() call
m1()
}

public void m3() {
// removing its content and replace with m1() call
m1()
}

[/code]

Method-2 : Delete and modify reference approach
  • Maintain one method (e.g.: m1() )which will be used throughout the software.
  • Delete all other identical methods. (e.g.: delete m2()m3() etc which are all identical methods of m1() )
  • Identify the code location from which the deleted methods are referenced and replace it with the unique method. (e.g.:  All calls to m2() and m3() must be replaced with m1() 

 

Identical set of LOC in multiple methods

In this case not all LOC of methods are identical but a good percentage is identical.Example:  Two methods – m1() and m2() in same or in different files ,which contains 30 LOC but 15 LOC are identical

The elimination procedure is slightly complicated than the previous ones for this scenario.

  1. Identify a less complex method which contains this identical code and make sure that it has Unit tests with good coverage.  e.g.:  m1 ()
  2. Create a new method and copy all the identical LOC to that method. e.g.: mn()
  3. Check whether these LOC is using any parameter / attribute reference which were a part of the parent method and if so add that to the method signature.  e.g.: if the LOC in mn() is referencing to an amount parameter then re-define the method signature as mn (int amount)
  4. Replace the LOC in parent method with the new method reference and passing the relevant parameters. Example mn( 100)
  5. Run all the unit tests for the parent method (e.g.: m1() ) and make sure it all got passed.
  6. Now apply step-4 and step-5 to other duplicate methods sharing the same identical LOC. e.g.: if m2() and m3() also has the same
  7. LOC as in m1() which was moved to mn(int amount), then delete those LOC from m2() and m3() and replace it with mn() call.

Where to create the new methods

In the above mentioned elimination approaches we are creating a new method.  However where to maintain this new method depends on the nature of the method.  However here are some generic guidelines.

  1. If it is a common method like Date formatting, it can be maintained in a library or utility class which can be used by all classes.
  2. If it is a method in a derived class, then move to the base class.
  3. If they are methods in two different classes, then check for the feasibility of introducing a base class. If the base class is not meaningful, consider it moving to a utility class.

There is one more kind of code duplication – Similar LOC.  They are not identical but behavior is similar. This will be addressed in a separate article.