Reputation: 5059
I have 4 files and would like to know elements which are non overlapping (per file) compared to the elements in other files.
File A
Vincy
ruby
rome
File B
Vincy
rome
Peter
File C
Vincy
Paul
alex
File D
Vincy
rocky
Willy
Any suggestion for one liner in perl, python, shell, bash. The expected output is:
File A: ruby
, File B: Peter
, File C: Paul
, Alex
File D: rocky
, Willy
.
Upvotes: 2
Views: 198
Reputation: 67900
Perl one-liner, readable version with comments:
perl -nlwe '
$a{$_}++; # count identical lines with hash
push @a, $_; # save lines in array
if (eof) { push @b,[$ARGV,@a]; @a=(); } # at eof save file name and lines
}{ # eskimo operator, executes rest of code at end of input files
for (@b) {
print shift @$_; # print file name
for (@$_) { print if $a{$_} == 1 }; # print unique lines
}
' file{A,B,C,D}.txt
Note: eof
is for each individual input file.
Copy/paste version:
perl -nlwe '$a{$_}++; push @a, $_; if (eof) { push @b,[$ARGV,@a]; @a=(); } }{ for (@b) { print shift @$_; for (@$_) { print if $a{$_} == 1 } }' file{A,B,C,D}.txt
Output:
filea.txt
ruby
fileb.txt
Peter
filec.txt
Paul
alex
filed.txt
rocky
Willy
Notes: This was trickier than expected, and I'm sure there's a way to make it prettier, but I'll post this for now and see if I can clean it up.
Upvotes: 1
Reputation: 5222
Edit after question clarified: Unique elements across all files, and the file in which it occurs:
cat File_A File_B File_C File_D |sort | uniq -u | while read line ; do file=`grep -l $line File*` ; echo "$file $line" ; done
Edit:
perly way of doing it, will be faster if the files are large:
#!/usr/bin/perl
use strict;
use autodie;
my $wordHash ;
foreach my $arg(@ARGV){
open(my $fh, "<", $arg);
while(<$fh>){
chomp;
$wordHash->{$_}->[0] ++;
push(@{$wordHash->{$_}->[1]}, $arg);
}
}
for my $word ( keys %$wordHash ){
if($wordHash->{$word}->[0] eq 1){
print $wordHash->{$_}->[1]->[0] . ": $word\n"
}
}
execute as: myscript.pl filea fileb filec ... filezz
stuff from before clarification: Easy enough with shell commands. Non repeating elements across all files
cat File_A File_B File_C File_D |sort | uniq -u
Unique elements across all files
cat File_A File_B File_C File_D |sort | uniq
Unique elements per file (edit thanks to @Dennis Williamson)
for line in File* ; do echo "working on $line" ; sort $line | uniq ; done
Upvotes: 10
Reputation: 88128
Here is a quick python script that will do what you ask over an arbitrary number of files:
from sys import argv
from collections import defaultdict
filenames = argv[1:]
X = defaultdict(list)
for f in filenames:
with open(f,'r') as FIN:
for word in FIN:
X[word.strip()].append(f)
for word in X:
if len(X[word])==1:
print "Filename: %s word: %s" % (X[word][0], word)
This gives:
Filename: D word: Willy
Filename: C word: alex
Filename: D word: rocky
Filename: C word: Paul
Filename: B word: Peter
Filename: A word: ruby
Upvotes: 4
Reputation: 59426
Hot needle:
import sys
inputs = {}
for inputFileName in sys.args[1:]:
with open(inputFileName, 'r') as inputFile:
inputs[inputFileName] = set([ line.strip() for line in inputFile ])
for inputFileName, inputSet in inputs.iteritems():
print inputFileName
result = inputSet
for otherInputFileName, otherInputSet in inputs.iteritems():
if otherInputFileName != inputFileName:
result -= otherInputSet
print result
Didn't try it though ;-)
Upvotes: 1