Reputation: 1954

Combine results of column one Then sum column 2 to list total for each entry in column one

I am bit of Bash newbie, so please bear with me here.

I have a text file dumped by another software (that I have no control over) listing each user with number of times accessing certain resource that looks like this:

Jim 109
Bob 94
John 92
Sean 91
Mark 85
Richard 84
Jim  79
Bob  70
John 67
Sean 62
Mark 59
Richard 58
Jim  57
Bob  55
John 49
Sean 48
Mark 46
.
.
.

My goal here is to get an output like this.

Jim  [Total for Jim]
Bob  [Total for Bob]
John [Total for John]

And so on.

Names change each time I run the query in the software, so static search on each name and then piping through wc does not help.

Upvotes: 3

Answers (3)

agc

Reputation: 8406

GNU datamash:

datamash -W -s -g1 sum 2 < input.txt

Output:

Bob 219
Jim 245
John    208
Mark    190
Richard 142
Sean    201

Upvotes: 1

Fritz G. Mehner

Reputation: 17188

Pure Bash:

declare -A result                 # an associative array

while read name value; do
  ((result[$name]+=value))
done < "$infile"

for name in ${!result[*]}; do
  printf  "%-10s%10d\n"  $name  ${result[$name]}
done

If the first 'done' has no redirection from an input file this script can be used with a pipe:

your_program | ./script.sh

and sorting the output

your_program | ./script.sh | sort

The output:

Bob              219
Richard          142
Jim              245
Mark             190
John             208
Sean             201

Upvotes: 4

hek2mgl

Reputation: 157947

This sounds like a job for awk :) Pipe the output of your program to the following awk script:

your_program | awk '{a[$1]+=$2}END{for(name in a)print name " " a[name]}'

Output:

Sean 201
Bob 219
Jim 245
Mark 190
Richard 142
John 208

The awk script itself can be explained better in this format:

# executed on each line
{
  # 'a' is an array. It will be initialized 
  # as an empty array by awk on it's first usage
  # '$1' contains the first column - the name
  # '$2' contains the second column - the amount
  #
  #  on every line the total score of 'name' 
  #  will be incremented  by 'amount'
  a[$1]+=$2
}
# executed at the end of input
END{
  # print every name and its score
  for(name in a)print name " " a[name]
}

Note, to get the output sorted by score, you can add another pipe to sort -r -k2. -r -k2 sorts the by the second column in reverse order:

your_program | awk '{a[$1]+=$2}END{for(n in a)print n" "a[n]}' | sort -r -k2

Output:

Jim 245
Bob 219
John 208
Sean 201
Mark 190
Richard 142

Upvotes: 6

Combine results of column one Then sum column 2 to list total for each entry in column one

Answers (3)

Related Questions