Reputation: 75
I have written a bash script where trying to obtain a new file from two files.
File1:
1000846364118,9,369,9901,0,2020.05.20 13:20:52,2020.07.14 16:38:11,2021.03.14 00:00:00,U,2020.07.14 16:38:11
1000683648398,9,369,9901,0,2019.05.04 19:50:39,2019.06.23 14:27:17,2019.12.31 23:59:59,U,2020.01.01 01:25:05
1000534726081,9,369,9901,0,2019.05.04 19:50:39,2019.06.23 14:27:17,2019.12.31 23:59:59,X,2020.01.01 01:25:05
File2:
1000846364118;0;;2021.04.04;9914;100084636;ISATD;U;TEST;1234567890;2;;0;0;0;0;2020.10.12.00:00:00;0;0
1000830686890;0;;2021.03.02;9807;100083068;ISATD;U;TEST;1234567891;2;;0;0;0;0;2020.10.12.00:00:01;0;0
1000835819335;0;;2021.03.21;9990;100083581;ISATD;U;TEST;1234567892;2;;0;0;0;0;2020.10.12.00:00:03;0;0
1000683648398;0;;2020.10.31;9829;100068364;ISATD;U;TEST;1234567893;2;;0;0;0;0;2020.10.12.00:00:06;0;0
New file will have rows from file1 only which is having pattern 'U' in it with extra column where 10th field(123456789X) of file2 will be there. So my final output will be like this:
1000846364118,9,369,9901,0,2020.05.20 13:20:52,2020.07.14 16:38:11,2021.03.14 00:00:00,U,2020.07.14 16:38:11,1234567890
1000683648398,9,369,9901,0,2019.05.04 19:50:39,2019.06.23 14:27:17,2019.12.31 23:59:59,U,2020.01.01 01:25:05,1234567893
My script is below and working fine but the only issue is the data with which I am plying is huge and to generate the file output it is taking too much time. I put a timespan after every step and found that for loop portion is taking hours to generate few KB data wherein I am playing with few hundred MBs of data. Need help to optimise it.
cat /dev/null > new_file
used_Serial_Number=`grep U file1 | awk -F "," '{print $1}'`
echo "Serial no extracted at `date`" # Till this portion is getting completed in 2-3mins
for i in $used_Serial_Number; do
msisdn=`grep $i file2 | awk -F ";" '{print $10}'`
grep $i file1 | awk -v msisdn=$msisdn -F "," 'BEGIN { OFS = "," } { print $0 , msisdn }' >> new_file
done
Upvotes: 1
Views: 54
Reputation: 133458
Could you please try following, written and tested with shown samples in GNU awk
. In case your 9th field of Input_file1 could be u
OR U
then change from $9=="U"
TO tolower($9)=="u"
for matching both cases.
awk '
BEGIN{
FS=";"
OFS=","
}
FNR==NR{
a[$1]=$10
next
}
($1 in a) && $9=="U"{
print $0,a[$1]
}
' Input_file2 FS="," Input_file1
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=";" ##Setting FS as ; here.
OFS="," ##Setting OFS as , here.
}
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when Input_file2 is being read.
a[$1]=$10 ##Creating array a with index $1 and value is $10 here.
next ##next will skip all further statements from here.
}
($1 in a) && $9=="U"{ ##Checking if $1 is in a and 9th field is U then do following.
print $0,a[$1] ##Printing current line along with value of a with index of $1 here.
}
' file2 FS="," file1 ##Mentioning Input_file2 then setting FS as , and mentioning Input_file1 here.
Upvotes: 2