Reputation: 1954
I am bit of Bash newbie, so please bear with me here.
I have a text file dumped by another software (that I have no control over) listing each user with number of times accessing certain resource that looks like this:
Jim 109 Bob 94 John 92 Sean 91 Mark 85 Richard 84 Jim 79 Bob 70 John 67 Sean 62 Mark 59 Richard 58 Jim 57 Bob 55 John 49 Sean 48 Mark 46 . . .
My goal here is to get an output like this.
Jim [Total for Jim] Bob [Total for Bob] John [Total for John]
And so on.
Names change each time I run the query in the software, so static search on each name and then piping through wc does not help.
Upvotes: 3
Views: 1038
Reputation: 8406
GNU datamash
:
datamash -W -s -g1 sum 2 < input.txt
Output:
Bob 219
Jim 245
John 208
Mark 190
Richard 142
Sean 201
Upvotes: 1
Reputation: 17188
Pure Bash:
declare -A result # an associative array
while read name value; do
((result[$name]+=value))
done < "$infile"
for name in ${!result[*]}; do
printf "%-10s%10d\n" $name ${result[$name]}
done
If the first 'done' has no redirection from an input file this script can be used with a pipe:
your_program | ./script.sh
and sorting the output
your_program | ./script.sh | sort
The output:
Bob 219
Richard 142
Jim 245
Mark 190
John 208
Sean 201
Upvotes: 4
Reputation: 157947
This sounds like a job for awk
:) Pipe the output of your program to the following awk
script:
your_program | awk '{a[$1]+=$2}END{for(name in a)print name " " a[name]}'
Output:
Sean 201
Bob 219
Jim 245
Mark 190
Richard 142
John 208
The awk
script itself can be explained better in this format:
# executed on each line
{
# 'a' is an array. It will be initialized
# as an empty array by awk on it's first usage
# '$1' contains the first column - the name
# '$2' contains the second column - the amount
#
# on every line the total score of 'name'
# will be incremented by 'amount'
a[$1]+=$2
}
# executed at the end of input
END{
# print every name and its score
for(name in a)print name " " a[name]
}
Note, to get the output sorted by score, you can add another pipe to sort -r -k2
. -r -k2
sorts the by the second column in reverse order:
your_program | awk '{a[$1]+=$2}END{for(n in a)print n" "a[n]}' | sort -r -k2
Output:
Jim 245
Bob 219
John 208
Sean 201
Mark 190
Richard 142
Upvotes: 6