Reputation: 1645
I have fileA.txt and a few lines of it are shown below:
AA
BB
CC
DD
EE
And i have fileB.txt, and it has text like shown below:
Group col2 col3 col4
1 pp 4567 AA,BC,AB
1 qp 3428 AA
2 pp 3892 AA
3 ee 28399 AA
4 dd 3829 BB,CC
1 dd 27819 BB
5 ak 29938 CC
For every line in fileA.txt, it should count the number of times it is present in fileB.txt based on column1 in fileB.txt.
Sample output should look like:
AA 3
BB 2
CC 2
AA is present 4 times but it is present in the group "1" twice. If it is present more than once in the same group in column1,it should be counted only once and therefore in the above output AA count is 3.
Any help using awk or any other oneliners?
Upvotes: 1
Views: 1014
Reputation: 77105
Here is an awk
one-liner that should work:
awk '
NR==FNR && !seen[$4,$1]++{count[$4]++;next}
($1 in count){print $1,count[$1]}' fileB.txt fileA.txt
Explaination:
NR==FNR&&!seen[$4,$1]++
pattern is only true when Column 1 has not been captured at all. For all duplicate captures we dont increment the counter. $1 in count
looks for first file column 1 presence in array. If it is present, we print along with counts. Output:
$ awk 'NR==FNR && !seen[$4,$1]++{count[$4]++;next}($1 in count){print $1,count[$1]}' fileB.txt fileA.txt
AA 3
BB 2
CC 1
awk '
NR==FNR {
n = split($4,tmp,/,/);
for(x = 1; x <= n; x++) {
if(!seen[$1,tmp[x]]++) {
count[tmp[x]]++
}
}
next
}
($1 in count) {
print $1, count[$1]
}' fileB.txt fileA.txt
Outputs:
AA 3
BB 2
CC 2
Upvotes: 1
Reputation: 56069
A simple awk one-liner.
awk 'NR>FNR{if($0 in a)print$0,a[$0];next}!a[$4,$1]++{a[$4]++}' fileB.txt fileA.txt
Note the order of files.
Upvotes: 0
Reputation: 295413
Pure bash (4.0 or newer):
#!/bin/bash
declare -A items=()
# read in the list of items to track
while read -r; do items[$REPLY]=0; done <fileA.txt
# read fourth column from fileB and increment for each match
while read -r _ _ _ item _; do
[[ ${items[$item]} ]] || continue # skip unrecognized values
items[$item]=$(( items[$item] + 1 )) # otherwise, increment
done <fileB.txt
# print output
for key in "${!items[@]}"; do # iterate over keys
value="${items[$key]}" # look up values
printf '%s\t%s\n' "$key" "$value" # print them together
done
Upvotes: 0