Reputation: 4139
How to get the intersection line between two .csv file? My question seem quite difficult to understand but i will give you an example like this:
I have 2 .csv files:
+---+-------+----+ +-----------------+ | A | B | C | | A | B | C | +---+-------+----+ +----+-------+----+ | 1 | ant | 14 |* | 6 | Fan | 12 | | 2 | bird | 11 | | 7 | gun | 55 |* | 3 | cat | 21 |* | 8 | horse | 21 |* | 4 | dog | 55 |* | 9 | ice | 15 | | 5 | egg | 99 | | 10 | jar | 14 |* +---+-------+---+ +----+-------+----+ Table 1 Table 2
So, if i use Table 1 to filter Table 2 i will get the output like this:
+----------------+ | A | B | C | +---+-------+----+ | 7 | gun | 55 |* | 8 | horse | 21 |* | 10| jar | 14 |* +---+-------+----+ Table 3
Yes, i use the last column of Table 1 to Filter on Table 2
How can i filter it like this with any tool?
Upvotes: 0
Views: 1061
Reputation: 5752
As per the comments above, here is what I did:
Create table T1 (A INT, B VARCHAR(100), C INT);
Create table T2 (A INT, B VARCHAR(100), C INT);
Insert into T1 Values (1, 'ant',14);
Insert into T1 Values (2, 'bird',11);
Insert into T1 Values (3, 'cat',21);
Insert into T1 Values (4, 'dog',55);
Insert into T1 Values (5, 'egg',99);
Insert into T2 Values (6, 'fan',12);
Insert into T2 Values (7, 'gun',55);
Insert into T2 Values (8, 'horse',21);
Insert into T2 Values (9, 'ice',15);
Insert into T2 Values (10, 'jar',14);
I am not sure if you have the data already in the table or not, there are tools to import csv files into your db.
If you are going to run the query often, then building an index on column A of each table would speed up the process. I did not build any indexes for this simple case.
The select required to get your result is:
select * from t2,t1 where t2.c = t1.c order by t2.a
If you are happy with the results you can put it in a table like this (SQL Server)
SELECT T2.A, T2.B , T2.C INTO TEST FROM t2,t1 where t2.c = t1.c order by t2.a
I hope this is what you want...
Upvotes: 1
Reputation: 14768
Assuming your input files do only contain the numbers and strings, without the additional table-formatting characters (just a \t
separated file), you may do the following:
Table 1 (t1.txt):
1 ant 14
2 bird 11
3 cat 21
4 dog 55
5 egg 99
Table 2 (t2.txt):
6 Fan 12
7 gun 55
8 horse 21
9 ice 15
10 jar 14
And the execution and output:
$ join -j 3 -o '2.1,2.2,2.3' <(sort -k 3n t1.txt) <(sort -k 3n t2.txt) | sort -n
7 gun 55
8 horse 21
10 jar 14
Alternatively, if you have many columns, you may first extract only the filtering column, and later perform the selection itself. Also, you may output the join
-input files without specifying an initial pattern (all the columns, in your case). After the execution you simply select the columns you actually want to display (using cut
, for exampĺe):
$ join -2 15 <(cut -f 15 t1.txt | sort) <(sort -k 15 t2.txt) | sort -n
Notice cut
uses \t
as the default separator -- which can be redefined with flag -d <separator>
. Anyway, as pointed by @EmmadKareem, you may be better off using a proper DB for this task -- as it will have dozens of optimizations, surely required by your hundred-thousand/million lines.
Upvotes: 1
Reputation: 58400
This might work for you (GNU sed):
sed -r 's/(\S+\s?){3}/\/(^\\S+\\s){2}\1$\/p/' file1.csv | sed -nrf - file2.csv
for a file that is space or tab separated.
For a file that is comma separated:
sed -r 's/([^,]+,?){3}/\/(^[^,]+,){2}\1$\/p/' file1.csv | sed -nrf - file2.csv
This works by creating a sed script from the first table and then uses it to filter against the second table.
Upvotes: 1
Reputation: 1872
Here is a script in Perl to do the job you want.
It works by scanning the first file and keeping the values of the third column in memory. Then it scans the second file and for every line read it compares the value of the third column with those in memory, if there is a match, it prints the line.
#!/usr/bin/perl
use warnings;
use strict;
use 5.010;
my %seen;
open my $file1_fh, '<', 'file1.txt'
or die "Can't open file1.txt $!";
while (<$file1_fh>) {
chomp;
$seen{ (split)[2] } = 1; #assumes line are delimited by whitespace.
}
close $file1_fh;
open my $file2_fh, '<', 'file2.txt'
or die "Can't open file2.txt $!";
while (<$file2_fh>) {
chomp;
my $third_column_value = (split)[2]; #assumes line are delimited by whitespace.
say if $seen{ $third_column_value };
}
close $file2_fh;
__END__
#OUTPUT
7 gun 55
8 horse 21
10 jar 14
Upvotes: 0