Vikas
Vikas

Reputation: 327

Find common lines in multiple files according to specific value and columns

This question is very common but I have little bit different condition. I have 10 files and I want to extract common rows. I found ->

perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/'  file1 file2 file3 file4

or in linux ->

comm [-1] [-2] [-3 ] file1 file2

But if file has 3 columns (or more columns) and I want to compare only first 2 columns (or more) and not the last column->

file1 ->

Col1   col2   col3

A      1       0
A      2       1

file2

Col1   col2   col3

A        2    0.5
A        1    10
B        1    10

desired output ->

Col1   col2   file1  file2

A        1      0      10
A        2      1      0.5

So in output, there should be 10 more columns if I have 10 files. Is it also possible as one liner perl (by modifying it) or what can we do?

Upvotes: 0

Views: 1675

Answers (1)

user554546
user554546

Reputation:

use strict;
use warnings;
use Array::Utils qw(intersect);

my $first_file=shift(@ARGV);
my @common_lines=();

#Grab all of the lines in the first file.

open(my $read,"<",$first_file) or die $!;

while(<$read>)
{
    chomp;
    my @arr=split /\t/;
    @arr=@arr[0,1]; #Only take first two columns.
    push @common_lines,join("\t",@arr);
}

close($read);
foreach my $file (@ARGV)
{
    my @matched_lines=();
    open($read,"<",$file) or die $!;
    while(<$read>)
    {
        chomp;
        my @arr=split /\t/;
        @arr=@arr[0,1];
        my $to_check=join("\t",@arr);

        #If $to_check is in @common_lines, put it in @matched_lines
        if(grep{$_ eq $to_check}@common_lines)
        {
            push @matched_lines,$to_check;
        }
    }
    close($read);

    #Take out elements of @common_lines that aren't in @matched_lines
    @common_lines=intersect(@common_lines,@matched_lines);

    unless(@common_lines)
    {
        print "No lines are common amongst the files!\n";
    }
}

foreach(@common_lines)
{
    print "$_\n";
}

Upvotes: 1

Related Questions