Reputation: 1061
I have two .csv files of same structure
ENSTGUG00000000001,C2CD2L
ENSTGUG00000000002,DPAGT1
ENSTGUG00000000003,EGR1
ENSTGUG00000000004,REEP2
ENSTGUG00000000005,
ENSTGUG00000000006,PIAS2
ENSTGUG00000000007,KDM3B
ENSTGUG00000000008,
ENSTGUG00000000010,HMBS
ENSTGUG00000000011,
and
ENSTGUG00000000001,
ENSTGUG00000000002,DPAGT1
ENSTGUG00000000003,EGR1
ENSTGUG00000000004,REEP2
ENSTGUG00000000005,
ENSTGUG00000000006,PIAS2
ENSTGUG00000000007,KDM3B
ENSTGUG00000000008,LOC100218573
ENSTGUG00000000010,HMBS
ENSTGUG00000000011,
I want to find elements in the 1st column of the 1st file for which the 2nd column is empty, and search for the corresponding in the 2nd column of the 2nd file.
So that the output would be
ENSTGUG00000000001,C2CD2L
ENSTGUG00000000002,DPAGT1
ENSTGUG00000000003,EGR1
ENSTGUG00000000004,REEP2
ENSTGUG00000000005,
ENSTGUG00000000006,PIAS2
ENSTGUG00000000007,KDM3B
ENSTGUG00000000008,LOC100218573
ENSTGUG00000000010,HMBS
ENSTGUG00000000011,
This way I can have 1st file as complete as possible.
Upvotes: 1
Views: 75
Reputation: 2892
awk oneliner:
$ $ awk 'BEGIN{FS=OFS=","} NR==FNR{if($2!="")a[$1]=$2;next}{if($2=="")$2=a[$1]}1' input2.txt input1.txt
ENSTGUG00000000001,C2CD2L
ENSTGUG00000000002,DPAGT1
ENSTGUG00000000003,EGR1
ENSTGUG00000000004,REEP2
ENSTGUG00000000005,
ENSTGUG00000000006,PIAS2
ENSTGUG00000000007,KDM3B
ENSTGUG00000000008,LOC100218573
ENSTGUG00000000010,HMBS
ENSTGUG00000000011,
explanation:
BEGIN{FS=OFS=","} # set field-separator and output field-seperator to a ","
NR==FNR{ # for the first input file
if ($2!="") # if $2 is not an empty string
a[$1]=$2; # save $2 in array a with index $1
next # skip to next line to read
}
{
if ($2=="") # if $2 is empty string
$2=a[$1] # replace it with the save value in array a
}1
Upvotes: 0
Reputation: 212238
Assuming your files line up line by line:
awk '{getline s < "file2.csv"; split( s, a, ",")}
!$2 && a[2] { $2 = a[2] }1' FS=, file1.csv
This is not even remotely robust, and if your files don't match exactly will be completely worthless, but it gives you somewhere to start.
It's fairly self-explanatory... awk by nature reads the input file one line at a time. Each time it reads a line, it performs the commands in the script. In this case, it reads one line from file2.csv and splits it on the ,
. Then it checks to see if the second field of the line read from file1.csv
evaluates as a boolean to false (so if you have entries that are integer 0, they may get overwritten) and that the second field in the line from file 2 is non-zero. If that's true, it assigns the field from file2 to the 2nd field of the current input line. Then it outputs the line.
Upvotes: 1