Reputation: 1645

count using awk commands

I have fileA.txt and a few lines of it are shown below:

AA
BB
CC
DD  
EE

And i have fileB.txt, and it has text like shown below:

Group  col2   col3    col4
1    pp    4567    AA,BC,AB
1    qp    3428    AA
2    pp    3892    AA
3    ee    28399   AA
4    dd    3829    BB,CC
1    dd    27819   BB
5    ak    29938   CC

For every line in fileA.txt, it should count the number of times it is present in fileB.txt based on column1 in fileB.txt.

Sample output should look like:

AA    3
BB    2
CC    2

AA is present 4 times but it is present in the group "1" twice. If it is present more than once in the same group in column1,it should be counted only once and therefore in the above output AA count is 3.

Any help using awk or any other oneliners?

Upvotes: 1

Answers (3)

jaypal singh

Reputation: 77105

Here is an awk one-liner that should work:

awk '
NR==FNR && !seen[$4,$1]++{count[$4]++;next}
($1 in count){print $1,count[$1]}' fileB.txt fileA.txt

Explaination:

NR==FNR&&!seen[$4,$1]++ pattern is only true when Column 1 has not been captured at all. For all duplicate captures we dont increment the counter.
$1 in count looks for first file column 1 presence in array. If it is present, we print along with counts.

Output:

$ awk 'NR==FNR && !seen[$4,$1]++{count[$4]++;next}($1 in count){print $1,count[$1]}' fileB.txt fileA.txt
AA 3
BB 2
CC 1

Update based on the modified question:

awk '
NR==FNR {
  n = split($4,tmp,/,/);
  for(x = 1; x <= n; x++) {
    if(!seen[$1,tmp[x]]++) {
      count[tmp[x]]++
      }
    }
  next
}
($1 in count) {
    print $1, count[$1]
}' fileB.txt fileA.txt

Outputs:

AA 3
BB 2
CC 2

Upvotes: 1

Kevin

Reputation: 56069

A simple awk one-liner.

awk 'NR>FNR{if($0 in a)print$0,a[$0];next}!a[$4,$1]++{a[$4]++}' fileB.txt fileA.txt

Note the order of files.

Upvotes: 0

Charles Duffy

Reputation: 295413

Pure bash (4.0 or newer):

#!/bin/bash

declare -A items=()

# read in the list of items to track
while read -r; do items[$REPLY]=0; done <fileA.txt

# read fourth column from fileB and increment for each match
while read -r _ _ _ item _; do
  [[ ${items[$item]} ]] || continue    # skip unrecognized values
  items[$item]=$(( items[$item] + 1 )) # otherwise, increment
done <fileB.txt

# print output
for key in "${!items[@]}"; do          # iterate over keys
  value="${items[$key]}"               # look up values
  printf '%s\t%s\n' "$key" "$value"    # print them together
done

Upvotes: 0

count using awk commands

Answers (3)

Update based on the modified question:

Related Questions