Reputation: 2501
I have two files:
1.txt:
j_e_s_s_i_c_a_a_n_n [email protected] 61b8a203438ea1c56c1489ec7bea7a0e
9871951 [email protected] 671cb9239bf797a082f723a07a9c713f
holliebrian [email protected] a2e531ea7df55290c35d74082f38f020
9075407 [email protected] d20f83ee6933aa1ea047fe5cbd9c1fd5
9837056 [email protected] e4d11b1c62cfbb7bfb49e4644e70d476
2.txt:
a2e531ea7df55290c35d74082f38f020:182:@*/
671cb9239bf797a082f723a07a9c713f:1199
e4d11b1c62cfbb7bfb49e4644e70d476:abcd123
d20f83ee6933aa1ea047fe5cbd9c1fd5:33;1:11
I want the two files as output: One is left.txt in which that lines of 1.txt will be there whose 3rd column (FS = ' ') does not match with second column of 2.txt (FS = ':')
left.txt:
j_e_s_s_i_c_a_a_n_n [email protected] 61b8a203438ea1c56c1489ec7bea7a0e
Another file is result.txt in which all the lines from 1.txt which contains a match in 2.txt. But in the output file the matched 3rd column should be replaced by the matched line 2nd column (FS = ':')
result.txt:
9871951 [email protected] 1199
holliebrian [email protected] 182:@*/
9075407 [email protected] 33;1:11
9837056 [email protected] abcd123
I wrote a script to achieve the same task:
awk -F : 'FNR==NR {s=$0;sub(/[^:]*:/, "", s); p[$1]=s; next} !($NF in p) {print > "left.txt"; next} {$NF=p[$NF]} 1' 2.txt FS=' ' OFS=' ' <(tr -d '\r' < 1.txt) > result.txt
I am getting the expected output but on bigger files 1.txt (~ 3GB) and 2.txt (~ 1 GB). The script is crashing with the following error:
awk: cmd. line:1: (FILENAME=2.txt FNR=21085923) fatal: /home/corinna/src/gawk/gawk-4.2.0/gawk-4.2.0-1.x86_64/src/gawk-4.2.0/node.c:1021:more_blocks: freep: can't allocate 9600 bytes of memory (Cannot allocate memory)
Please help me to make the script run for bigger files. Any help would be highly appreciated. Using awk is not must. The only motto is to do the right job in lesser time and without crashing.
Upvotes: 0
Views: 283
Reputation: 133750
Following awk
may help you in same.
awk '
FNR==NR{
val=$1;
sub(/[^:]*/,"");
sub(/:/,"");
a[val]=$0;
next
}
!($NF in a){
print > "left.txt";
next
}
{
print $1,$2,a[$NF]> "result.txt"
}
' FS=":" 2.txt FS=" " OFS=" " 1.txt
Upvotes: 2