Matching a column against multiple possible values

Question

I am trying to use awk to filter out data from a file. As of now, for a single value I am using the command like the below

hadoop fs -text file:///a/b/filename.snappy  awk -F'|'  '$11 == 655' > filter_20180705.txt

I want to pass a list of values in comparison section, instead of passing $11 == 655, I would like to pass a list like $11 IN (list). Any thoughts on this will be helpful

Sample-data:

karthick,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,655,dev
kumar,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,611,dev
Raj,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,800,dev
John,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,823,dev

As of now I am getting the result as

karthick,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,655,dev

I will need pass a list in the comparison filter. For example, I want to add 611 and 823. So the expected result would be

karthick,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,655,dev
kumar,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,611,dev
John,bangalore,software,it,bfsi,spark,hadoop,bigdata,etl,pentaho,823,dev

Inian · Accepted Answer

Define a variable in the context of Awk (using -v) to include alternates for your regex match (separated by |) and do your match as below. The ~ is your regex match operator in Awk that allows you match $11 to the string defined (see How to Use Regular Expressions).

awk -F, -v list="655|611|823" '$11 ~ list' file

With the above approach your command translates to match $11 to any of the entries defined in the variable list. To negate your regex match (i.e. opposite of what I want to match), use the negation ! operator as

awk -F, -v list="655|611|823" '$11 !~ list' file

To avoid false matches like 1182345 to be part of the match, make the regex more strict by including the start and end patterns

awk -F, -v list="655|611|823" 'BEGIN{list="^(" list ")$"} $11 ~ list' file

Matching a column against multiple possible values

Answers (2)

Related Questions