Reputation: 341
I have a csv file and another text file (say file1.csv and file2.txt). The text file has one column. Now, I want to filter the csv file based on the data in the text file. For example,
file1.csv ----------- 1,a,b,c 2,d,e,f 3,g,d,g file2.txt ----------- 1 3
I want the result to be-
1,a,b,c 3,g,d,g
Upvotes: 1
Views: 2139
Reputation: 891
And for Windows Command Version:
findstr /G:file2.txt file1.csv > result.csv
Upvotes: 0
Reputation: 74
Try the below command:
grep -F -f file2.txt file1.csv
1,a,b,c
3,g,d,g
Upvotes: 0
Reputation: 753805
For a solution using awk
, see the answer by kev.
For an improved solution using grep -f
, consider using bash
process substitution:
grep -f <(sed 's/.*/^&,/' file2.txt) file1.csv
This uses sed
on the lines in file2.txt
to put a caret at the start of each line and a comma at the end, so that when treated as a regex by (GNU?) grep
, the patterns only match the exact field value at the start of the line. If you don't have bash
, you may be able to use:
sed 's/.*/^&,/' file2.txt | grep -f - file1.csv
However, not all versions of grep
read standard input when you specify -f -
(the version on Mac OS X doesn't, for example, but GNU grep
does).
Alternatively, you could use the join
command, with appropriate sorts:
join -o 1.1,1.2,1.3,1.4 -t, <(sort file1.csv) <(sort file2.txt)
If you're confident the files are already sorted, you can simplify that to just:
join -o 1.1,1.2,1.3,1.4 -t, file1.csv file2.txt
In Perl, you could use:
#!/usr/bin/env perl
use strict;
use warnings;
my $file = 0;
my %rows;
while (<>)
{
chomp;
$rows{$_}++ if ($file == 0);
if ($file == 1)
{
my($id) = split /,/;
print "$_\n" if defined $rows{$id};
}
}
continue
{
$file = 1 if eof;
}
There are probably other ways to do it too; for example, you might find a use for modules such as Text::CSV.
However, this code reads each line. If it is from the first file, then it creates an entry $rows{$_}++
to record that the number was seen. Order and repetition don't matter. In the second (and subsequent) files, it splits the first comma separated field out of the line, and checks whether that number was found in the first file; if so, it prints the whole line. The continue
block detects when the code reaches EOF on the first file (in particular) and sets $file = 1;
when it does. It is isomorphic with the awk
solution. This is a little verbose. There is the -a
mode (awk
mode), but because the two files need to be treated differently, it's a tad tricky to get it to work right.
Of these, I think the grep -f
solution is probably the neatest as long as file2.txt
is not too large (and I'm not sure what the limit would be — but probably surprisingly large).
For a general purpose CSV file manipulation tool, consider csvfix.
Upvotes: 1
Reputation: 161674
Try this command:
awk -F, 'FNR==NR{a[$0];next};$1 in a' file2.txt file1.csv
Logic is simple:
FOR each line in 'file2.txt' and 'file1.csv'
IF line is from 'file2.txt'
store it to array 'a'
CONTINUE
ENDIF
IF column 1 of line is in 'a'
PRINT line
ENDIF
ENDFOR
Upvotes: 2