Reputation: 35
These are some of the lines I have in a file. The idea is to get the unique count of blades in the file.
sS'/on/demandware.servlet/webdav/Sites/Logs/service-ACI_Preauth_Card-blade2-3.mon.demandware.net-0-appserver-20201105.log'
sS'/on/demandware.servlet/webdav/Sites/Logs/service-ACI_Preauth_Card-blade5-0.mon.demandware.net-0-appserver-20201105.log'
sS'/on/demandware.servlet/webdav/Sites/Logs/service-ACI_Preauth_Card-blade3-9.mon.demandware.net-0-appserver-20201105.log'
sS'/on/demandware.servlet/webdav/Sites/Logs/service-ACI_Preauth_Card-blade4-5.mon.demandware.net-0-appserver-20201104.log'
sS'/on/demandware.servlet/webdav/Sites/Logs/service-ACI_Preauth_Card-blade4-6.mon.demandware.net-0-appserver-20201104.log'
sS'/on/demandware.servlet/webdav/Sites/Logs/service-ACI_Preauth_Card-blade4-5.mon.demandware.net-0-appserver-20201103.log'
sS'/on/demandware.servlet/webdav/Sites/Logs/service-ACI_Preauth_Card-blade4-2.mon.demandware.net-0-appserver-20201104.log'
sS'/on/demandware.servlet/webdav/Sites/Logs/service-ACI_Preauth_Card-blade3-9.mon.demandware.net-0-appserver-20201104.log'
This is the script I've tried.
#!/bin/bash +x
pwd
cat *.p >> test.txt
awk '{ match($0,/[0-9]{8}/);arr[substr($0,RSTART,RLENGTH)]+=1;match($0,/blade/);spoint=RSTART+RLENGTH;match($0,/\.demandware/) } END { for (i in arr) { print i" - "arr[i]} } ' test.txt >> gen_output.txt
grep "2020" gen_output.txt
all I get the output as
20201105 - 3
20201104 - 4
20201103 - 1
All the blades count on a single day is considered.
The desired output should be like
20201105 - 3
20201104 - 2
20201103 - 1
(blade4 & blade3) on 20201104, blade4 is repeated thrice, so that should be considered as one. Please suggest some ideas here.
Upvotes: 1
Views: 88
Reputation: 246764
grep -Eo 'blade[0-9]+|[0-9]{8}' file | paste - - | sort -u | cut -f2 | sort | uniq -c
outputs
1 20201103
2 20201104
3 20201105
Upvotes: 2
Reputation: 785008
You can get this done in a single awk
:
awk 'match($0, /-blade[0-9]+/) {
b = substr($0, RSTART, RLENGTH)
}
match($0, /[0-9]{8}/) {
d = substr($0, RSTART, RLENGTH)
if (!seen[d,b]++)
freq[d]++
}
END {
for (i in freq)
print i, freq[i]
}' file
20201103 1
20201104 2
20201105 3
Upvotes: 2
Reputation: 133458
1st solution(without sorting): Could you please try following, written and tested with shown samples only in GNU awk
.
awk -F"[-.]" '
match($0,/ACI_Preauth_Card-blade[0-9]+/){
val=substr($0,RSTART,RLENGTH)
if(!arr[val,$(NF-1)]++){
arr1[$(NF-1)]++
}
val=""
}
END{
for(key in arr1){
print key" - "arr1[key]
}
}' Input_file
Output will be as follows.
20201103 - 1
20201104 - 2
20201105 - 3
2nd solution(with sorting option of gawk): OR in case you have GNU awk
and needed output in YYMMDD descending form then try following.
awk -F"[-.]" '
match($0,/ACI_Preauth_Card-blade[0-9]+/){
val=substr($0,RSTART,RLENGTH)
if(!arr[val,$(NF-1)]++){
arr1[$(NF-1)]++
}
val=""
}
END{
PROCINFO["sorted_in"] = "@ind_num_desc"
for(key in arr1){
print key" - "arr1[key]
}
}' Input_file
Output will be as follows.
20201105 - 3
20201104 - 2
20201103 - 1
Upvotes: 2