Wednesday, June 15, 2016

Calculating statistics on the output EasyPMD's Copy/Paste Detector

Here is a shell script that computes the number of duplicate code segments and the total number of copied lines based on the output of EasyPMD's Copy/Paste Detector. This output is structured by segments starting with:

Found a x line (y tokens) duplication in the following files:

The script extracts these lines and sums up the occurrences and number of lines.



#!/bin/bash

if [[ $# -eq 0 ]] ; then
    echo 'Please specify a file name.'
    exit 0
fi

# Extract strings like:
# Found a X line (Y tokens) duplication in the following files: 
cat $1 | grep Found > tmp.txt    

# define counters
sum_lines=0
occurrences=0

# parse temp file and extract # of lines
while read line
do
    tmp=(`echo $line | tr ' ' ' '`)
    lines=${tmp[2]}
    sum_lines=`expr $sum_lines + $lines`
    occurrences=`expr $occurrences + 1`
done < tmp.txt

rm tmp.txt

echo "$occurrences code duplicates, $sum_lines lines in total."

No comments:

Post a Comment