Reputation: 61
I have a dataset with 1000 rows and 10 columns. Here is the sample dataset
A,B,C,D,E,F,
a,b,c,d,e,f,
g,h,i,j,k,l,
m,n,o,p,q,r,
s,t,u,v,w,x,
From this dataset I want to copy the rows whose has value of column A as 'a' or 'm' to a new csv file. Also I want the header to get copied.
I have tried using awk. It copied all the rows but not the header.
awk '{$1~/a//m/ print}' inputfile.csv > outputfile.csv
How can I copy the header also into the new outputfile.csv?
Thanks in advance.
Upvotes: 4
Views: 1602
Reputation: 58558
This might work for you (GNU sed):
sed '1b;/^[am],/!d' oldFile >newFile
Always print the first line and delete any other line that does not beging a,
or m,
.
Alternative:
awk 'NR==1 || /^[am],/' oldFile >newFile
Upvotes: 2
Reputation: 133760
Considering that your header will be on 1st row, could you please try following.
awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^a$|^m$/' Input_file > outputfile.csv
OR as per Cyrus sir's comment adding following:
awk 'BEGIN{FS=OFS=","} FNR==1{print;next} $1 ~ /^(a|m)$/' Input_file > outputfile.csv
OR as per Ed sir's comment try following:
awk -F, 'NR==1 || $1~/^[am]$/' Input_file > outputfile.csv
Added corrections in OP's attempt:
FS
and OFS
as ,
here for all lines since lines are comma delimited.FNR==1
condition which means it is checking 1st line here and printing it simply, since we want to print headers in out file. It will print very first line and then next
will skip all further statements from here.$1 ~ /^a$|^m$/
Upvotes: 2
Reputation: 204548
$ awk -F, 'BEGIN{split("a,m",tmp); for (i in tmp) tgts[tmp[i]]} NR==1 || $1 in tgts' file
A,B,C,D,E,F,
a,b,c,d,e,f,
m,n,o,p,q,r,
Upvotes: 1
Reputation: 88929
With awk. Set field separator (FS
) to ,
and output current row if it's first row or if its first column contains a
or m
.
awk 'NR==1 || $1=="a" || $1=="m"' FS=',' in.csv >out.csv
Output to out.csv:
A,B,C,D,E,F, a,b,c,d,e,f, m,n,o,p,q,r,
Upvotes: 1
Reputation: 326
It appears that awk's default delimiter is whitespace. Link
Changing the delimiter can be denoted by using the FS variable:
awk 'BEGIN { FS = "," } ; { print $2 }'
Upvotes: -1