toy
toy

Reputation: 12151

How do I sum all numbers from output of jq

I have this command that I would like to sum all the numbers from the output.

The command looks like this

$(hadoop fs -ls -R /reports/dt=2018-08-27 | grep _stats.json | awk '{print $NF}' | xargs hadoop fs -cat | jq '.duration')

So it's going to list all the folders in /reports/dt=2018-08-27 and get only _stats.json and pass that through jq from hadoop -cat and get only .duration from the json. Which in the end I get the result like this.

1211789 1211789 373585 495379 1211789

But I would like the command to sum all those numbers together to become 4504331

Upvotes: 38

Views: 43831

Answers (7)

törzsmókus
törzsmókus

Reputation: 2001

the simplest solution is the add filter:

jq '[.duration] | add'

the [ brackets ] are needed around the value to sum because add sums the values of an array, not a stream. (for stream summation, you would need a more sophisticated solution, e.g. using reduce, as detailed in other answers.)


depending on the exact format of the input, you may need some preprocessing to get this right.

e.g. for the sample input in Charles Duffy’s answer either

  • use inputs (note that -n is needed to avoid jq swallowing the first line of input):

    jq -n '[inputs.duration] | add' <<< "$sample_data"
    
  • or slurp (-s) and iterate (.[]) / map:

    jq -s '[.[].duration] | add' <<< "$sample_data"
    jq -s 'map(.duration) | add' <<< "$sample_data"
    

Upvotes: 63

Romain
Romain

Reputation: 21958

From a combination of other answers.

$ jq -n '[inputs | .duration] | add' <<< "$sample_data"

# 4504331

I had to format the values in an array [inputs | .duration] before summing values with add.

Upvotes: 4

Timmmm
Timmmm

Reputation: 96832

You can just use add now.

jq '.duration | add'

Upvotes: 15

peak
peak

Reputation: 116900

For clarity and generality, it might be worthwhile defining sigma(s) to add a stream of numbers:

... | jq -n '
  def sigma(s): reduce s as $x(0;.+$x); 
  sigma(inputs | .duration)'

Upvotes: 3

karakfa
karakfa

Reputation: 67527

awk to the rescue!

$ ... | awk '{sum+=$0} END{print sum}'

4504331

Upvotes: 18

Charles Duffy
Charles Duffy

Reputation: 295679

Another option (and one that works even if not all your durations are integers) is to make your jq code do the work:

sample_data='{"duration": 1211789}
{"duration": 1211789}
{"duration": 373585}
{"duration": 495379}
{"duration": 1211789}'

jq -n '[inputs | .duration] | reduce .[] as $num (0; .+$num)' <<<"$sample_data"

...properly emits as output:

4504331

Replace the <<<"$sample_data" with a pipeline on stdin as desired.

Upvotes: 24

Barmar
Barmar

Reputation: 781964

Use a for loop.

total=0
for num in $(hadoop fs -ls -R /reports/dt=2018-08-27 | grep _stats.json | awk '{print $NF}' | xargs hadoop fs -cat | jq '.duration')
do
    ((total += num))
done
echo $total

Upvotes: -1

Related Questions