Reputation: 9
I have 2 files and I want to sum first columns depend on same seconds. If there is no a time that mean it is zero, if time is duplicate it means sum all same time but how, help me please.
First file:
16 /home/appuser<Apr 4, 2016 11:24:46 PM EEST
2 /home/appuser<Apr 4, 2016 11:24:47 PM EEST
3 /home/appuser<Apr 4, 2016 11:24:48 PM EEST
1 /home/appuser<Apr 4, 2016 11:24:50 PM EEST
3 /home/appuser<Apr 4, 2016 11:24:51 PM EEST
7 /home/appuser<Apr 4, 2016 11:24:52 PM EEST
9 /home/appuser<Apr 4, 2016 11:24:54 PM EEST
8 /home/appuser<Apr 4, 2016 11:24:54 PM EEST
5 /home/appuser<Apr 4, 2016 11:24:55 PM EEST
Second file:
6 /home/appuser<Apr 4, 2016 11:24:46 PM EEST
4 /home/appuser<Apr 4, 2016 11:24:49 PM EEST
7 /home/appuser<Apr 4, 2016 11:24:50 PM EEST
5 /home/appuser<Apr 4, 2016 11:24:50 PM EEST
10 /home/appuser<Apr 4, 2016 11:24:52 PM EEST
6 /home/appuser<Apr 4, 2016 11:24:52 PM EEST
10 /home/appuser<Apr 4, 2016 11:24:55 PM EEST
5 /home/appuser<Apr 4, 2016 11:24:57 PM EEST
output:
22 /home/appuser<Apr 4, 2016 11:24:46 PM EEST
2 /home/appuser<Apr 4, 2016 11:24:47 PM EEST
3 /home/appuser<Apr 4, 2016 11:24:48 PM EEST
4 /home/appuser<Apr 4, 2016 11:24:49 PM EEST
13 /home/appuser<Apr 4, 2016 11:24:50 PM EEST
3 /home/appuser<Apr 4, 2016 11:24:51 PM EEST
23 /home/appuser<Apr 4, 2016 11:24:52 PM EEST
0 /home/appuser<Apr 4, 2016 11:24:53 PM EEST
17 /home/appuser<Apr 4, 2016 11:24:54 PM EEST
15 /home/appuser<Apr 4, 2016 11:24:55 PM EEST
0 /home/appuser<Apr 4, 2016 11:24:56 PM EEST
5 /home/appuser<Apr 4, 2016 11:24:57 PM EEST
Upvotes: 0
Views: 85
Reputation: 785146
This gets pretty tricky due to requirement of inserting 0
and missing date.
Here is an awk with sort
that you can use:
awk -F '<| /' '{
cmd="date -d \"" $3 "\" +%s"
cmd | getline ts
close(cmd)
if (p>0 && (ts-p)>1) {
for(i=p+1; i<ts; i++) {
sums[i]=0
cmd="TZ=EET date -d @" i " \"+%b%e, %Y %r %Z\""
cmd | getline tsi
close(cmd)
data[i]= "/" c2 "<" tsi
}
}
sums[ts]+=$1
data[ts]="/" $2 "<" $3
p = ts
c2 = $2
}
END {
for (i in sums)
printf "%4d%s%s\n", sums[i], OFS, data[i]
}' <(sort -t'<' -k2 file1 file2)
Output:
22 /home/appuser<Apr 4, 2016 11:24:46 PM EEST
2 /home/appuser<Apr 4, 2016 11:24:47 PM EEST
3 /home/appuser<Apr 4, 2016 11:24:48 PM EEST
4 /home/appuser<Apr 4, 2016 11:24:49 PM EEST
13 /home/appuser<Apr 4, 2016 11:24:50 PM EEST
3 /home/appuser<Apr 4, 2016 11:24:51 PM EEST
23 /home/appuser<Apr 4, 2016 11:24:52 PM EEST
0 /home/appuser<Apr 4, 2016 11:24:53 PM EEST
17 /home/appuser<Apr 4, 2016 11:24:54 PM EEST
15 /home/appuser<Apr 4, 2016 11:24:55 PM EEST
0 /home/appuser<Apr 4, 2016 11:24:56 PM EEST
5 /home/appuser<Apr 4, 2016 11:24:57 PM EEST
Upvotes: 1
Reputation:
In your original question you put output that includes times that have sum of 0, I'm not sure where that's from -- presuming that's additional data that you don't have to worry about the following will sum up column one based on matching column twos. This can be expanded to as many files as you need, just add them to the file list in the input cat --> <(cat f1.txt f2.txt f3.txt ... fn.txt)
unset myarr && declare -A myarr
while read a; do
col1=$(cut -d' ' -f1 <<< "${a}")
col2=$(cut -d' ' -f3- <<< "${a}")
let myarr["${col2}"]+="${col1}"
done < <(awk '{var=$1; $1=""; print var,$0}' <(cat f1.txt f2.txt))
for a in "${!myarr[@]}"; do echo "${myarr["$a"]} ${a}"; done
Upvotes: 0
Reputation: 77
try using the below code.hope it helps
$ awk '{a[$5]+=$1; sub(/[0-9]+/,"",$1); line[$5]=$0}
END{for(k in a) printf "%2d %s\n",a[k],line[k]}' first second
13 /home/appuser<Apr 4, 2016 11:24:50 PM EEST
3 /home/appuser<Apr 4, 2016 11:24:51 PM EEST
23 /home/appuser<Apr 4, 2016 11:24:52 PM EEST
17 /home/appuser<Apr 4, 2016 11:24:54 PM EEST
15 /home/appuser<Apr 4, 2016 11:24:55 PM EEST
22 /home/appuser<Apr 4, 2016 11:24:46 PM EEST
2 /home/appuser<Apr 4, 2016 11:24:47 PM EEST
5 /home/appuser<Apr 4, 2016 11:24:57 PM EEST
3 /home/appuser<Apr 4, 2016 11:24:48 PM EEST
4 /home/appuser<Apr 4, 2016 11:24:49 PM EEST
Upvotes: 0