Reputation: 542
I have three text files containing names and grades. I removed the grades and created new files with just the names. Here is what the files look like:
first.txt
Alice
Bob
Carl
Derrick
Jessica
Sarah
Zach
second.txt
Alice
Bob
Derrick
Jared
Jessica
Sarah
Zach
third.txt
Bob
Jared
Sarah
Slate
Terry
Zach
I want to compare all three files and if there is a name in one file that is not in the other, I want to add it in. So at the end all files will contain the same names. I know you gonna add lines in perl, so a new file will have to created to do this.
Here is my approach to it. I start by comparing the first and second, adding differences from second into first. Then comparing first and second, adding differences from first into second. Then I compare the second file (either works) with third file, print differences from second into third file. Then I compare second and third, and print differences that are in third into both first and second. I put compare statements in as well to ensure the files have the same entries.
The files with grades are named original1.txt
original2.txt
original3.txt
In the end I will take the files containing the new names, and combine them with the files that have the grades. If there is no grade for a new name in the file, it will simply have no grade entry.
Is there a cleaner way of doing this? It looks like a huge mess.
Upvotes: 1
Views: 123
Reputation: 52354
Unless this is for a class or something where using perl is a hard requirement, the cleaner way is to not use perl at all, but standard shell utilites.
Assuming your originalN.txt
files look something like:
Alice A
Bob B
Carl C
Derrick D
Jessica A
Sarah B
Zach C
with tabs separating the columns
you can do:
sort -um <(cut -f1 original1.txt) \
<(cut -f1 original2.txt) \
<(cut -f1 original3.txt) > allnames.txt
to get a file with all the names from all three files (If they're not already sorted by name, use sort -u ...
instead). This does require bash, zsh, or ksh93 for the <(command)
redirection syntax.
Then you can merge those names with each individual file with a left outer join
:
$ join -t$'\t' -a1 allnames.txt original1.txt
Alice A
Bob B
Carl C
Derrick D
Jared
Jessica A
Sarah B
Slate
Terry
Zach C
and so on.
If using perl, there's no need for all those temporary files. Just stick the names from all the original files in a hash:
#!/usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
# Read all names from the files given on the command line.
my %names;
for my $file (@ARGV) {
open my $infile, "<", $file;
while (<$infile>) {
my $n = ( split /\t/ )[0];
$names{$n} = 1;
}
}
# And for each file, merge with all the names
for my $file (@ARGV) {
say "****** $file *******";
open my $infile, "<", $file;
my %grades = map { $_ => undef } keys %names;
while (<$infile>) {
chomp;
my ( $name, $grade ) = split /\t/;
$grades{$name} = $grade;
}
for my $name ( sort keys %grades ) {
if ( defined $grades{$name} ) {
say "$name\t$grades{$name}";
}
else {
say $name;
}
}
}
Writing the results to files instead of standard output is left as an exercise for the reader.
Upvotes: 3