James Raitsev
James Raitsev

Reputation: 96521

How to detect code change frequency?

I am working on a program written by several folks with largely varying skill level. There are files in there that have never changed (and probably never will, as we're afraid to touch them) and others that are changing constantly.

I wonder, are there any tools out there that would look at the entire repo history (git) and produce analysis on how frequently a given file changes? Or package? Or project?

It would be of value to recognize that (for example) we spent 25% of our time working on a set of packages, which would be indicative or code's fragility, as compared with code that "just works".

Upvotes: 28

Views: 6691

Answers (5)

bcarlso
bcarlso

Reputation: 2345

I wrote something that we use to visualize this information successfully.

https://github.com/bcarlso/defect-density-heatmap

Take a look at the project and you can see what the output looks like in the readme.

You can do what you need by first getting a list of files that have changed in each commit from Git.

~ $ git log --pretty="format:" --name-only | grep -v ^$ > file-changes.txt

~ $ for i in `cat file-changes.txt | cut -d"." -f1,2 | uniq`; do num=`cat file-changes.txt | grep $i | wc -l`; if (( $num > 1 )); then echo $num,0,$i; fi; done | heatmap > results.html 

This will give you a tag cloud with files that churn more will show up larger.

Upvotes: 6

aolchik
aolchik

Reputation: 63

Building on a previous answer I suggest the following script to parse all project files

#!/bin/sh
cd $1
find . -path ./.git -prune -o -name "*" -exec sh -c 'git log --follow --format=oneline $1 | wc -l | awk "{ print \$1,\"\\t\",\"$1\" }" ' {} {} \; | sort -nr
cd ..

If you call the script as file_churn.sh you can parse your git project directory calling

> ./file_churn.sh project_dir

Hope it helps.

Upvotes: 2

Henrik
Henrik

Reputation: 9945

I'd have a look at NChurn:

NChurn is a utility that helps asses the churn level of your files in your repository. Churn can help you detect which files are changed the most in their life time. This helps identify potential bug hives, and improper design.The best thing to do is to plug NChurn into your build process and store history of each run. Then, you can plot the evolution of your repository's churn.

Upvotes: 8

Cydonia7
Cydonia7

Reputation: 3846

I suggest using a command like

git log --follow -p file

That will give you all the changes that happened to the file in the history (including renames). If you want to get the number of commits that changed the file then you can do on a UNIX-based OS :

git log --follow --format=oneline Gemfile | wc -l

You can then create a bash script to apply this to multiple files with the name aside.

Hope it helped !

Upvotes: 5

Dave Newton
Dave Newton

Reputation: 160291

If you're looking for an OS solution, I'd probably consider starting with gitstats and look at extending it by grabbing file logs and aggregating that data.

Upvotes: 11

Related Questions