Reputation: 341

Filtering CSV files

I have a csv file and another text file (say file1.csv and file2.txt). The text file has one column. Now, I want to filter the csv file based on the data in the text file. For example,

file1.csv
-----------
1,a,b,c
2,d,e,f
3,g,d,g

file2.txt
-----------
1
3

I want the result to be-

1,a,b,c
3,g,d,g

Upvotes: 1

Answers (4)

Michael

Reputation: 891

And for Windows Command Version:

findstr /G:file2.txt file1.csv > result.csv

Upvotes: 0

Purandaran

Reputation: 74

Try the below command:

grep -F -f file2.txt file1.csv

1,a,b,c
3,g,d,g

Upvotes: 0

Jonathan Leffler

Reputation: 753805

For a solution using awk, see the answer by kev.

For an improved solution using grep -f, consider using bash process substitution:

grep -f <(sed 's/.*/^&,/' file2.txt) file1.csv

This uses sed on the lines in file2.txt to put a caret at the start of each line and a comma at the end, so that when treated as a regex by (GNU?) grep, the patterns only match the exact field value at the start of the line. If you don't have bash, you may be able to use:

sed 's/.*/^&,/' file2.txt | grep -f - file1.csv

However, not all versions of grep read standard input when you specify -f - (the version on Mac OS X doesn't, for example, but GNU grep does).

Alternatively, you could use the join command, with appropriate sorts:

join -o 1.1,1.2,1.3,1.4 -t, <(sort file1.csv) <(sort file2.txt)

If you're confident the files are already sorted, you can simplify that to just:

join -o 1.1,1.2,1.3,1.4 -t, file1.csv file2.txt

In Perl, you could use:

#!/usr/bin/env perl
use strict;
use warnings;

my $file = 0;
my %rows;

while (<>)
{
    chomp;
    $rows{$_}++ if ($file == 0);
    if ($file == 1)
    {
        my($id) = split /,/;
        print "$_\n" if defined $rows{$id};
    }
}
continue
{
    $file = 1 if eof;
}

There are probably other ways to do it too; for example, you might find a use for modules such as Text::CSV.

However, this code reads each line. If it is from the first file, then it creates an entry $rows{$_}++ to record that the number was seen. Order and repetition don't matter. In the second (and subsequent) files, it splits the first comma separated field out of the line, and checks whether that number was found in the first file; if so, it prints the whole line. The continue block detects when the code reaches EOF on the first file (in particular) and sets $file = 1; when it does. It is isomorphic with the awk solution. This is a little verbose. There is the -a mode (awk mode), but because the two files need to be treated differently, it's a tad tricky to get it to work right.

Of these, I think the grep -f solution is probably the neatest as long as file2.txt is not too large (and I'm not sure what the limit would be — but probably surprisingly large).

For a general purpose CSV file manipulation tool, consider csvfix.

Upvotes: 1

kev

Reputation: 161674

Try this command:

awk -F, 'FNR==NR{a[$0];next};$1 in a' file2.txt file1.csv

Logic is simple:

FOR each line in 'file2.txt' and 'file1.csv'
    IF line is from 'file2.txt'
        store it to array 'a'
        CONTINUE
    ENDIF
    IF column 1 of line is in 'a'
        PRINT line
    ENDIF
ENDFOR

Upvotes: 2

Filtering CSV files

Answers (4)

Related Questions