Reputation: 486
Let's say I have the following 2 files with entries such as these (number, IP and User-agent):
30000 11.11.11.11 Dalvik/2.1.0 Linux
10000 22.22.22.22 GetintentCrawler getintent.com
5000 33.33.33.33 Mozilla/5.0 X11; Linux i686 AppleWebKit/537.36 KHTML, like Gecko Chrome/43.0.2357.130 Safari/537.36
3000 44.44.44.44 Mozilla/5.0 Macintosh; Intel Mac OS X 10_6_8 AppleWebKit/534.59.10 KHTML, like Gecko Version/5.1.9 Safari/534.59.10
1000 55.55.55.55 Dalvik/1.6.0 Linux; U; Android 4.1.2; Orange Yumo Build/OrangeYumo
and
6000 44.44.44.44 Mozilla/5.0 Macintosh; Intel Mac OS X 10_6_8 AppleWebKit/534.59.10 KHTML, like Gecko Version/5.1.9 Safari/534.59.10
3000 33.33.33.33 Mozilla/5.0 X11; Linux i686 AppleWebKit/537.36 KHTML, like Gecko Chrome/43.0.2357.130 Safari/537.36
2000 11.11.11.11 Dalvik/2.1.0 Linux
600 55.55.55.55 Dalvik/1.6.0 Linux; U; Android 4.1.2; Orange Yumo Build/OrangeYumo
500 22.22.22.22 GetintentCrawler getintent.com
I want to be able to sum up the first column for all identical IPs (the second column), while also keeping all the subsequent columns with the user-agent. Also, the final output should be sorted by first column.
So the result should basically look like this:
32000 11.11.11.11 Dalvik/2.1.0 Linux
10500 22.22.22.22 GetintentCrawler getintent.com
9000 44.44.44.44 Mozilla/5.0 Macintosh; Intel Mac OS X 10_6_8 AppleWebKit/534.59.10 KHTML, like Gecko Version/5.1.9 Safari/534.59.10
8000 33.33.33.33 Mozilla/5.0 X11; Linux i686 AppleWebKit/537.36 KHTML, like Gecko Chrome/43.0.2357.130 Safari/537.36
1600 55.55.55.55 Dalvik/1.6.0 Linux; U; Android 4.1.2; Orange Yumo Build/OrangeYumo
So far I came up with this, but I lose the whole user-agent string and I also feel that I'm overcomplicating things:
cat file1.txt file2.txt file3.txt | awk '{arr[$2]+=$1;} END {for (i in arr) print i, arr[i]}' | awk '{ print $2" "$1 }' | sort -rn
Upvotes: 1
Views: 78
Reputation: 785286
You can use this gnu-awk
:
awk 'BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"} {
p=$1; $1=""; a[$0]+=p} END{for (i in a) print a[i] i}' file1 file2
BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"}
is used to maintain order of keys in associative array.
Output:
32000 11.11.11.11 Dalvik/2.1.0 Linux
10500 22.22.22.22 GetintentCrawler getintent.com
8000 33.33.33.33 Mozilla/5.0 X11; Linux i686 AppleWebKit/537.36 KHTML, like Gecko Chrome/43.0.2357.130 Safari/537.36
9000 44.44.44.44 Mozilla/5.0 Macintosh; Intel Mac OS X 10_6_8 AppleWebKit/534.59.10 KHTML, like Gecko Version/5.1.9 Safari/534.59.10
1600 55.55.55.55 Dalvik/1.6.0 Linux; U; Android 4.1.2; Orange Yumo Build/OrangeYumo
Upvotes: 2