Vijiy
Vijiy

Reputation: 1197

Get count of records in file with a filter in file

I have a file | delimited, File has 4 columns

123|456|789|234
456|456|789|2345
124|456|789|2345
125|456|789|2345
126|456|789|2345
128|451|782|2345

Output --> 6

How can find the count of distinct records in column1 where col2=456 and col3=789.

I know how to find the distinct of col1 for whole file, but how do we find after applying some filter on other column

Upvotes: 0

Views: 425

Answers (5)

kvantour
kvantour

Reputation: 26491

A slightly different logic than the previous awk solutions. But than also slightly:

awk -F '|' '($2!=246 || $3!=789){next}!a[$1]++{c++}END{print c}' file

Upvotes: 0

stack0114106
stack0114106

Reputation: 8711

You can try Perl

perl -lne ' /^(.+?)\|(.+?)\|(.+?)\|/ and $2==456 and $3==789 and $kv{$1}++ ; END { print scalar keys %kv } ' 

with the given inputs

$ cat vijiy.txt
123|456|789|234
456|456|789|2345
124|456|789|2345
125|456|789|2345
126|456|789|2345
128|451|782|2345
$ perl -lne ' /^(.+?)\|(.+?)\|(.+?)\|/ and $2==456 and $3==789 and $kv{$1}++ ; END { print scalar keys %kv } ' vijiy.txt
5
$

Upvotes: 0

James Brown
James Brown

Reputation: 37424

Using grep'n cut'n sort'n wc'n a bunch of pipes:

$ grep ^[^\|]*\|456\|789\| file | cut -d \| -f 1-3 | sort -u | wc -l
5

Demonstrated:

$ grep ^[^\|]*\|456\|789\| file  # search for ^...|456|789|
123|456|789|234
456|456|789|2345
124|456|789|2345
125|456|789|2345
126|456|789|2345

and pipe it to

cut -d \| -f 1-3               # to lose the last field
123|456|789
456|456|789
124|456|789
125|456|789
126|456|789

and on to

sort -u                        # sort and get unique records
123|456|789
124|456|789
125|456|789
126|456|789
456|456|789

and finally

wc -l                          # count them
5

Super Mario's daily bread:

|

Upvotes: 1

tripleee
tripleee

Reputation: 189648

Here is a simple Awk script.

awk -F '|' '$2=="456" && $3=="789" { if (!a[$1]++) count++ }
    END { print count }' file

Your sample data contains 5 distinct values in column 1 with these constraints.

If you want to pass in the values as parameters,

c2="456"
c3="789"
awk -F '|' -v col2="$c2" -v col3="$c3" '$2==col2 && $3==col3 {
        if (!a[$1]++) count++ }
    END { print count }' file

The array a collects the keys from column 1 which we have already seen. If the array doesn't contain the value of the first column already, we add 1 to count.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133610

Could you please try following.

awk -F'|' '$2==456 && $3==789 && !a[$1,$2,$3]++{count++} END{print count}' Input_file

Upvotes: 1

Related Questions