Reputation: 5
I have two CSV files. Both have a column that contains the same data, with the difference that one file contains more data in that column than the other.
I want to just print out the rows of file2
in which contains the same string in that column as in the other file.
For example:
file1
App_Int1 SID_bla1
App_Int2 SID_bla2
App_Int_4 SID_bla4
file2
SID_bla1 hello bye ...
SID_bla2 good bad ...
SID_bla5 hey ho ....
SID_bla4 hi cheers ...
And I want the output to be like this
SID_bla1 hello bye ...
SID_bla2 good bad ...
SID_bla4 hi cheers ...
Because file1 doesn't contain SID_bla5
, the row with SID_bla5
will not be printed.
Here is my code but it doesn't work, can somebody give me some hints?
#!C:\Perl\bin\perl
use strict;
use warnings;
my $file = $ARGV[0] || die "Need to get CSV file on the command line\n";
my $mystring = "";
open(my $data, '<', $file) || die "Could not open '$file' $!\n";
my $newfile = "fooNew3.txt";
open(FILE2, ">", $newfile) || die "Could not open file";
my $file2 = "export.txt";
open(my $data2, '<', $file2) || die "Could not open '$file2' $!";
my $mystring2 = "";
my $line2;
my %filehash;
my @fields2 = "";
while ($line2 = <$data2>) {
chomp $line2;
@fields2 = split(";", $line2);
while (my $line = <$data>) {
chomp $line;
my @fields = split(";", $line);
if ($filehash{ $fields2[0] } eq $fields[1]) {
# if the first column of file2 is identical with the second column of file1
# then output the identical string and the second column of file2
# which belongs to the first column of file2 (which is the identical string)
print FILE2 join ';', "$fields[1]; $filehash{$fields2[0]} $fields2[1] \n";
}
What would be wrong with this?
if ($fields2[0] eq $fields[1] {
print $fields2[0] $fields2[1] $fields2[2];
}
Upvotes: 0
Views: 2344
Reputation:
You are over-engineering the problem.
$ awk 'NR == FNR {a[$2]; next}$1 in a' file1.txt file2.txt
SID_bla1 hello bye ...
SID_bla2 good bad ...
SID_bla4 hi cheers ...
If you want to use Perl, invoke it with -ap
for autosplit and auto loop over each line and print.
If your data are ;
-separated, such as
file1.txt
App_Int1;SID_bla1
App_Int2;SID_bla2
App_Int_4;SID_bla4
file2.txt
SID_bla1;hello;bye;...
SID_bla2;good;bad;...
SID_bla5;hey;ho;....
SID_bla4;hi;cheers;...
You could just set the field separator to be ;
:
$ awk -F';' 'NR == FNR {a[$2]; next}$1 in a' file1.txt file2.txt
SID_bla1;hello;bye;...
SID_bla2;good;bad;...
SID_bla4;hi;cheers;...
Upvotes: 0
Reputation: 126722
Although you haven't described it well, what I think you want is all of the lines in file2
whose first column matches any of the values in the second column of file1
. This short Perl program will do that for you.
I have assumed the fields in your files are separated by any mixture of whitespace - spaces or tabs. It works by building a hash from the data in file1
that has a true value for every string appearing in the second column of each record. That is all that is needed from the first file.
Then file2
is opened and processed. The first field in each line is checked using the hash, and the line is printed if there is a corresponding hash element.
use strict;
use warnings;
use autodie;
my $fh;
my %wanted;
open $fh, '<', 'file1.txt';
while (<$fh>) {
my @fields = split;
$wanted{$fields[1]} = 1;
}
open $fh, '<', 'file2.txt';
while (<$fh>) {
my @fields = split;
print if $wanted{$fields[0]};
}
output
SID_bla1 hello bye ...
SID_bla2 good bad ...
SID_bla4 hi cheers ...
Upvotes: 0
Reputation: 35198
As a perl script, your code could be simplified to the following:
#!C:\Perl\bin\perl
use strict;
use warnings;
die "Usage: $0 File1 File2\n" if @ARGV != 2;
my $file2 = pop;
my %seen;
while (<>) {
my @F = split;
$seen{$F[1]}++;
}
local @ARGV = $file2;
while (<>) {
my @F = split;
print if $seen{$F[0]};
}
Upvotes: 1