intransigent_rocker
intransigent_rocker

Reputation: 75

Bash Function to Summarize Commits by Month

I've created a Bash function to summarize monthly Python and SAS commits from the past three years in a Git repository. My current function iterates through the most recent 36 months, generates a Git log for each month, and then counts the commits that touch Python and SAS files within that month. While it works, the loop takes a bit of time to execute.

When I attempt to generate a single Git log for the entire three-year period and count monthly commits from that, I end up with no counts in my output. I would prefer this method if I can get it to work, as it would eliminate the need for looping and improve performance. Has anyone successfully done this, or can anyone suggest how I might revise my current function to achieve this?

Here is my current working function (with loop):

#!/bin/bash

function count_commits_by_month() {
    local start_date
    start_date=$(date -d "$(date +%Y-%m-01) -36 months" +%Y-%m-01)

    local end_date
    end_date=$(date -d "$(date +%Y-%m-01) +1 month" +%Y-%m-01)

    local current_date="$start_date"

    echo -e "Month\tPython\tSAS"

    while [[ "$current_date" < "$end_date" ]]; do
        local next_month
        next_month=$(date -d "$current_date +1 month" +%Y-%m-01)

        # Count Python commits
        local py_commits
        py_commits=$(git log --no-merges --since="$current_date" --until="$next_month" --pretty=format:"%h" --name-only -- "*.py" | \
                     awk 'NF && !seen[$0]++' | wc -l)

        # Count SAS commits
        local sas_commits
        sas_commits=$(git log --since="$current_date" --until="$next_month" --pretty=format:"%h" --name-only -- "*.sas" | \
                      awk 'NF && !seen[$0]++' | wc -l)

        # Print the results for the current month
        echo -e "$(date -d "$current_date" +%Y-%m)\t$py_commits\t$sas_commits"

        # Move to the next month
        current_date="$next_month"
    done
}

Here is my non-working function (without loop):


function get_commits() {
    local start_date
    start_date=$(date -d "$(date +%Y-%m-01) -36 months" +%Y-%m-01)

    local end_date
    end_date=$(date -d "$(date +%Y-%m-01) +1 month" +%Y-%m-01)

    # Print the header with aligned columns
    printf "%-10s %-10s %-10s\n" "Month" "Python" "SAS"

    # Use a single git log call to get all commits in the date range
    git log --no-merges --since="$start_date" --until="$end_date" --pretty=format:"%ad %h" --date=format:'%Y-%m' --name-only -- "*.py" "*.sas" | \
    awk '
    BEGIN {
        OFS = "\t";
    }
    /^[0-9]{4}-[0-9]{2}/ {
        date = $1;
        commit = $2;
        seen_py[commit] = 0;
        seen_sas[commit] = 0;
    }
    /\.py$/ {
        if (!seen_py[commit]++) {
            py[date]++;
        }
    }
    /\.sas$/ {
        if (!seen_sas[commit]++) {
            sas[date]++;
        }
    }
    END {
        for (date in py) {
            if (!(date in sas)) {
                sas[date] = 0;
            }
        }
        for (date in sas) {
            if (!(date in py)) {
                py[date] = 0;
            }
        }
        PROCINFO["sorted_in"] = "@ind_str_asc"
        for (date in py) {
            printf "%-10s %-10d %-10d\n", date, py[date], sas[date];
        }
    }'
}

Upvotes: 0

Views: 82

Answers (1)

jthill
jthill

Reputation: 60487

git log --date=format:%Y-%m --pretty=format:%cd \
        --date-order --no-merges --name-status  \
        --   \*.py \*.sas   \
| awk -F$'\t' '
        NF>1  { sub(/.*\./,""); suf[$0]=1; next }

        NF<1  { for ( s in suf ) ++ctouch[s]; delete suf; next }
        END   { for ( s in suf ) ++ctouch[s]; delete suf }

        function endmonth() {
                for (s in ctouch)
                        printf( \
                        "%s: %7d commits touched some .%s file(s)\n",
                                last,ctouch[s],s)
                   last=$1
                   delete ctouch
        }

        NF==1 { if ( $1!=last ) endmonth() }
        END   { endmonth() }
'

seems to do the trick for me.

Upvotes: 2

Related Questions