Kalib Zen
Kalib Zen

Reputation: 925

How do you combine 2 awk output

I have this text file with the following text content called output.txt:

1.2.2.2 LOCAL_IP
LOCAL_IP 1.1.1.1
1.1.1.1 LOCAL_IP
233.233.233.233 LOCAL_IP
123.123.123.123 LOCAL_IP
233.233.233.233 LOCAL_IP
231.231.231.231 LOCAL_IP
123.123.123.123 LOCAL_IP
LOCAL_IP 123.111.23.2
LOCAL_IP 221.22.22.22
1.1.1.1 LOCAL_IP
LOCAL_IP 1.2.2.2
LOCAL_IP 123.123.123.123
2.2.2.2 LOCAL_IP 
LOCAL_IP 3.3.21.2
LOCAL_IP 2.2.2.2
1.2.2.2 LOCAL_IP
LOCAL_IP 123.123.123.123
LOCAL_IP 123.111.23.2
1.1.1.1 LOCAL_IP

I want to count the total occurrence of the IP above excluding the text string called LOCAL_IP. For example, this is my working code:

#!/bin/bash

output="output.txt"

a=$(awk '{ print $1 }' $output | grep -v 'LOCAL_IP' | sort | uniq -c | sed 's/^ *//' | sed -e "s/ /:/g")
b=$(awk '{ print $2 }' $output | grep -v 'LOCAL_IP' | sort | uniq -c | sed 's/^ *//' | sed -e "s/ /:/g")


echo "$a"
echo "-------"
echo "$b"

So this is the output printed from the above script (the first it prints a, and then b):

3:1.1.1.1
2:1.2.2.2
2:123.123.123.123
1:2.2.2.2
1:231.231.231.231
2:233.233.233.233
-------
1:1.1.1.1
1:1.2.2.2
2:123.111.23.2
2:123.123.123.123
1:221.22.22.22
1:2.2.2.2
1:3.3.21.2

Is there a way to combine the result of a variable a and b and then recalculate / update the total count for result a and b (before the symbol :) ? Then the result for the new variable c that has combined the a and b would be similar to this:

echo "$c"

4:1.1.1.1 # added from a and b
3:1.2.2.2
2:123.111.23.2
4:123.123.123.123
1:221.22.22.22
2:2.2.2.2
1:3.3.21.2
1:231.231.231.231
2:233.233.233.233

I'm not sure if I need to develop algorithm for this. Can someone shed some light maybe there is a simpler method to achieve this.

Upvotes: 5

Views: 960

Answers (5)

James Brown
James Brown

Reputation: 37394

One for GNU awk for using sorted_in for sorting the output, ditch it if you don't care about output order:

$ gawk '
{
    a[$1]++                               # just count them all
    a[$2]++
}
END {                                     # and in the end
    delete a["LOCAL_IP"]                  # delete this one
    PROCINFO["sorted_in"]="@ind_str_asc"  # sorting method
    for(i in a)
        printf "%d:%s\n", a[i], i         # output
}' file                                   # | sort -t: -k2

Uncomment the | sort -t: -k2 if you are using some other awk than GNU awk - in which case you can also remove PROCINFO["sorted_in"]="@ind_str_asc". Also, I noticed in the comments a request to reformat the output; replace the printf with printf "%s (%d)\n", i, a[i]

Output:

4:1.1.1.1
3:1.2.2.2
2:123.111.23.2
4:123.123.123.123
2:2.2.2.2
1:221.22.22.22
1:231.231.231.231
2:233.233.233.233
1:3.3.21.2

Upvotes: 3

Oliver Gaida
Oliver Gaida

Reputation: 1920

I would not use awk in this case. This way is the easiest (for me).

  1. remove the LOCAL_IP string:
sed -E "s/ ?LOCAL_IP ?//" output.txt 
1.2.2.2
1.1.1.1
1.1.1.1
233.233.233.233
123.123.123.123
233.233.233.233
231.231.231.231
123.123.123.123
123.111.23.2
221.22.22.22
1.1.1.1
1.2.2.2
123.123.123.123
2.2.2.2
3.3.21.2
2.2.2.2
1.2.2.2
123.123.123.123
123.111.23.2
1.1.1.1
  1. sort and uniq count it:
sed -E "s/ ?LOCAL_IP ?//" output.txt  | sort | uniq -c
      4 1.1.1.1
      3 1.2.2.2
      2 123.111.23.2
      4 123.123.123.123
      1 221.22.22.22
      2 2.2.2.2
      1 231.231.231.231
      2 233.233.233.233
      1 3.3.21.2
  1. format as you need to and sort by count:
sed -E "s/ ?LOCAL_IP ?//" output.txt  | sort | uniq -c | sed -E 's/^ *([0-9]+) *(.*)$/\1:\2/' | sort -rn
4:123.123.123.123
4:1.1.1.1
3:1.2.2.2
2:233.233.233.233
2:2.2.2.2
2:123.111.23.2
1:3.3.21.2
1:231.231.231.231
1:221.22.22.22

All these steps are easy to follow. Except the last regex with the backreferences. To learn more about regex, put the regex in https://regex101.com/ and read the explanation.

Upvotes: 2

dannysauer
dannysauer

Reputation: 3867

Weird; this was part of an interview question I recently did involving counting the traffic to an IP. I withdrew for other reasons, so you can have the job. ;)

In any event, there are a few ways to do this. You can remove LOCAL_IP from the input with sed or awk. For example:

awk 'BEGIN{OFS=""} $1=="LOCAL_IP"{$1=""} $2=="LOCAL_IP"{$2=""} {print $0}' $output | ...
sed 's/ *LOCAL_IP *//' $output | ...

That awk needs you to set OFS empty, otherwise you end up with leading/trailing spaces due to the empty fields with that specific structure (which I would not use; all of the other options are better).

Or you could just print the field you want using a regexp to match IPs. Or you could do several other things, like combining the two commands before piping to the grep:

{ awk '{ print $1 }' $output; awk '{ print $2 }' $output; } | grep ...

Though really, that would be better done with a lighter-weight command like cut. Also worth noting: when you just want one field, I'm partial to awk '$0=$1' because it's less typing and {print $0} is implied when there's no block specified. :)

Or just use tr to replace the spaces with newlines, then do your grep.

tr ' ' '\n' $output | grep...

Edit: Building on @RavinderSingh13's solution from above, you could also do it all in one awk:

awk '
  $1=="LOCAL_IP" { arr[$2]++ }
  $2=="LOCAL_IP" { arr[$1]++ }
  END{ for(i in arr){print arr[i]":"i} }
' $output

Lots of options. :D

Upvotes: 5

RavinderSingh13
RavinderSingh13

Reputation: 133428

This could be done in a single awk itself could you please try following. Written on mobile and tested in link https://ideone.com/oKFxR7

Since OP's samples all lines have string LOCAL_IP so I have not put that condition in solution in case someone needs to look for lines which have string then we could simply add a search condition too along with match function.

awk '
match($0,/([0-9]+\.){3}[0-9]+/){
  arr[substr($0,RSTART,RLENGTH)]++
}
END{
  for(i in arr){
    print arr[i]":"i
  }
}
' Input_file

Explanation: simply using match function of awk and providing regex inside it to match ip addresses. Then creating array named arr and mentioning it's index as substring of matched regex(where RSTART and RLENGTH variables are default ones and get their values from matched regex).

Then finally when program is done with reading Input_file in END block of this code traversing through arr array and printing occurences of IP which are array value and printing IP address which is array's index.

Upvotes: 3

tso
tso

Reputation: 4904

awk '{if ($1=="LOCAL_IP") {print $2} else if($2=="LOCAL_IP"){print $1}}' output.txt |sort|uniq -c|sed 's/^ *//'

Just what you already did, but instead of extracting IPs into 2 different variable, extract all IPs same time.

Simply checking if first field is LOCAL_IP, then print second field, otherwise check if second field is LOCAL_IP then print first field.

If your output file always has LOCAL_IP IP or IP LOCAL_IP structure, you don't need second comparison:

awk '{if ($1=="LOCAL_IP") {print $2} else {print $1}}' output.txt |sort|uniq -c|sed 's/^ *//'

Upvotes: 3

Related Questions