user3507732
user3507732

Reputation: 5

Perl: Comparing two string of two files

I have two CSV files. Both have a column that contains the same data, with the difference that one file contains more data in that column than the other.

I want to just print out the rows of file2 in which contains the same string in that column as in the other file.

For example:

file1

App_Int1     SID_bla1
App_Int2     SID_bla2
App_Int_4    SID_bla4

file2

SID_bla1     hello     bye    ...
SID_bla2     good      bad    ...
SID_bla5     hey       ho     ....
SID_bla4     hi        cheers ...

And I want the output to be like this

SID_bla1     hello     bye    ...
SID_bla2     good      bad    ...
SID_bla4     hi        cheers ...

Because file1 doesn't contain SID_bla5, the row with SID_bla5 will not be printed.

Here is my code but it doesn't work, can somebody give me some hints?

#!C:\Perl\bin\perl
use strict;
use warnings;

my $file = $ARGV[0] || die "Need to get CSV file on the command line\n";
my $mystring = "";

open(my $data, '<', $file) || die "Could not open '$file' $!\n";
my $newfile = "fooNew3.txt";
open(FILE2, ">", $newfile) || die "Could not open file";

my $file2 = "export.txt";
open(my $data2, '<', $file2) || die "Could not open '$file2' $!";

my $mystring2 = "";
my $line2;
my %filehash;
my @fields2 = "";

while ($line2 = <$data2>) {
  chomp $line2;

  @fields2 = split(";", $line2);
  while (my $line = <$data>) {
    chomp $line;

    my @fields = split(";", $line);
    if ($filehash{ $fields2[0] } eq $fields[1]) {
      # if the first column of file2 is identical with the second column of file1
      # then output the identical string and the second column of file2
      # which belongs to the first column of file2 (which is the identical string)

      print FILE2 join ';', "$fields[1]; $filehash{$fields2[0]} $fields2[1] \n";
    }

What would be wrong with this?

  if ($fields2[0] eq $fields[1] {
    print $fields2[0] $fields2[1] $fields2[2];
  }

Upvotes: 0

Views: 2344

Answers (3)

user918938
user918938

Reputation:

You are over-engineering the problem.

$ awk 'NR == FNR {a[$2]; next}$1 in a' file1.txt file2.txt
SID_bla1     hello     bye    ...
SID_bla2     good      bad    ...
SID_bla4     hi        cheers ...

If you want to use Perl, invoke it with -ap for autosplit and auto loop over each line and print.

If your data are ;-separated, such as

file1.txt

App_Int1;SID_bla1
App_Int2;SID_bla2
App_Int_4;SID_bla4

file2.txt

SID_bla1;hello;bye;...
SID_bla2;good;bad;...
SID_bla5;hey;ho;....
SID_bla4;hi;cheers;...

You could just set the field separator to be ;:

$ awk -F';' 'NR == FNR {a[$2]; next}$1 in a' file1.txt file2.txt
SID_bla1;hello;bye;...
SID_bla2;good;bad;...
SID_bla4;hi;cheers;...

Upvotes: 0

Borodin
Borodin

Reputation: 126722

Although you haven't described it well, what I think you want is all of the lines in file2 whose first column matches any of the values in the second column of file1. This short Perl program will do that for you.

I have assumed the fields in your files are separated by any mixture of whitespace - spaces or tabs. It works by building a hash from the data in file1 that has a true value for every string appearing in the second column of each record. That is all that is needed from the first file.

Then file2 is opened and processed. The first field in each line is checked using the hash, and the line is printed if there is a corresponding hash element.

use strict;
use warnings;
use autodie;

my $fh;
my %wanted;

open $fh, '<', 'file1.txt';
while (<$fh>) {
  my @fields = split;
  $wanted{$fields[1]} = 1;
}

open $fh, '<', 'file2.txt';
while (<$fh>) {
  my @fields = split;
  print if $wanted{$fields[0]};
}

output

SID_bla1     hello     bye    ...
SID_bla2     good      bad    ...
SID_bla4     hi        cheers ...

Upvotes: 0

Miller
Miller

Reputation: 35198

As a perl script, your code could be simplified to the following:

#!C:\Perl\bin\perl
use strict;
use warnings;

die "Usage: $0 File1 File2\n" if @ARGV != 2;

my $file2 = pop;

my %seen;
while (<>) {
    my @F = split;
    $seen{$F[1]}++;
}

local @ARGV = $file2;
while (<>) {
    my @F = split;
    print if $seen{$F[0]};
}

Upvotes: 1

Related Questions