Reputation: 61

Filtering rows based on column values of csv file

I have a dataset with 1000 rows and 10 columns. Here is the sample dataset

A,B,C,D,E,F,
a,b,c,d,e,f,
g,h,i,j,k,l,
m,n,o,p,q,r,
s,t,u,v,w,x,

From this dataset I want to copy the rows whose has value of column A as 'a' or 'm' to a new csv file. Also I want the header to get copied.

I have tried using awk. It copied all the rows but not the header.

awk '{$1~/a//m/ print}' inputfile.csv > outputfile.csv

How can I copy the header also into the new outputfile.csv?

Thanks in advance.

Upvotes: 4

Answers (5)

Reputation: 58558

This might work for you (GNU sed):

sed '1b;/^[am],/!d' oldFile >newFile

Always print the first line and delete any other line that does not beging a, or m,.

Alternative:

awk 'NR==1 || /^[am],/' oldFile >newFile

Upvotes: 2

Reputation: 133760

Considering that your header will be on 1st row, could you please try following.

awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^a$|^m$/' Input_file > outputfile.csv

OR as per Cyrus sir's comment adding following:

awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^(a|m)$/' Input_file > outputfile.csv

OR as per Ed sir's comment try following:

awk -F, 'NR==1 || $1~/^[am]$/' Input_file > outputfile.csv

Added corrections in OP's attempt:

Added FS and OFS as , here for all lines since lines are comma delimited.
Added FNR==1 condition which means it is checking 1st line here and printing it simply, since we want to print headers in out file. It will print very first line and then next will skip all further statements from here.
Used a better regex for checking 1st field's condition $1 ~ /^a$|^m$/

Upvotes: 2

Reputation: 204548

$ awk -F, 'BEGIN{split("a,m",tmp); for (i in tmp) tgts[tmp[i]]} NR==1 || $1 in tgts' file
A,B,C,D,E,F,
a,b,c,d,e,f,
m,n,o,p,q,r,

Upvotes: 1

Reputation: 88929

With awk. Set field separator (FS) to , and output current row if it's first row or if its first column contains a or m.

awk 'NR==1 || $1=="a" || $1=="m"' FS=',' in.csv >out.csv

Output to out.csv:

A,B,C,D,E,F,
a,b,c,d,e,f,
m,n,o,p,q,r,

Upvotes: 1

Reputation: 326

It appears that awk's default delimiter is whitespace. Link

Changing the delimiter can be denoted by using the FS variable:

awk 'BEGIN { FS = "," } ; { print $2 }'

Upvotes: -1