Reputation: 622
The problem I have is related to a previous post. Now what I have right now is:
One 9-million-rows CSV file containing fields arranged randomly like:
192.168.12.23,62,LOCAL,341993,23/10/2012
192.168.12.25,11,MONLOCAL$MONREMOTE,33222$56,22/10/2012$18/10/2012
192.168.12.678,14,MONLOCAL,341993,22/10/2012
192.168.12.83,18, , ,
192.168.12.21,49,LOCAL$REMOTE,19316$15253,22/10/2012$22/10/2012
192.168.12.79,52,REMOTE,1180306134,19/10/2012
192.168.12.41,44,MONLOCAL$MONREMOTE,1865871$383666,22/10/2012$22/10/2012
192.168.12.29,23,MONREMOTE,151145,18/10/2012
Where as you probably already observed, there are 2 field separators, ,
and $
:
field 1 = IP Address
field 2 = An unique ID
field 3 = connection that IP has used (e.g LOCAL, REMOTE, MONLOCAL, or MONREMOTE)
field 4 = can be also the connection IP has used (e.g LOCAL, REMOTE, MONLOCAL, or MONREMOTE) or the value related to field 3
field 5 = if field 4 is LOCAL or REMOTE or MONLOCAL or MONREMOTE, then field 5 represent the value of field 3 else it represent the timestamp related to field 3
field 6 = if field 4 is LOCAL or REMOTE or MONLOCAL or MONREMOTE, then field 6 represent the value of field 4 else it will not exist.
field 7 = if field 4 is LOCAL or REMOTE or MONLOCAL or MONREMOTE, then field 7 represent the timestamp of field 3 else it will not exist.
field 8 = if field 4 is LOCAL or REMOTE or MONLOCAL or MONREMOTE, then field 7 represent the timestamp of field 4 else it will not exist.
I need to process the file above and output the following format, having the fields always in following order (10 fields):
IP,ID,MONLOCAL_value,MONLOCAL_timestamp,LOCAL_value,LOCAL_timestamp,MONREMOTE_value,MONREMOTE_timestamp,REMOTE_value,REMOTE_timestamp like:
192.168.12.23,62, , ,341993,23/10/2012, , , ,
192.168.12.25,11,33222,22/10/2012, , , , ,56,18/10/2012
192.168.12.678,14,341993,22/10/2012, , , , , ,
192.168.12.83,18, , , , , , , ,
192.168.12.21,49, , ,19316,22/10/2012, , ,15253,22/10/2012
192.168.12.79,52, , , , , , ,1180306134,19/10/2012
192.168.12.41,44,1865871,22/10/2012, , ,383666,22/10/2012, ,
192.168.12.29,23, , , , ,151145,18/10/2012, ,
I have the following script to process the file from above:
nawk 'BEGIN {
while (getline < "'"$data"'" > 0)
{
{FS = "[,,$]"; OFS=","}
split($0,flds)
{if ($4 ~ /LOCAL|REMOTE|MONLOCAL|MONREMOTE/) {
if ($3 ~ /MONLOCAL/) {
MONREMOTE_time=flds[8];
MONREMOTE_value=flds[6];
MONLOCAL_time=flds[7];
MONLOCAL_value=flds[5]; }
if ($3 ~ /MONREMOTE/) {
MONREMOTE_time=flds[7];
MONREMOTE_value=flds[5];
REMOTE_value=flds[6];
REMOTE_time=flds[8]; }
if ($3 ~ /REMOTE/) {
REMOTE_value=flds[5];
REMOTE_time=flds[7];
LOCAL_value=flds[6];
LOCAL_time=flds[8]; }
} else {
if($3 ~ /MONLOCAL/) {
MONLOCAL_value=flds[4];
MONLOCAL_time=flds[5]; }
if ($3 ~ /MONREMOTE/) {
MONREMOTE_value=flds[4];
MONREMOTE_time=flds[5]; }
if ($3 ~ /LOCAL/) {
LOCAL_value=flds[4];
LOCAL_time=flds[5]; }
if ($3 ~ /REMOTE/) {
REMOTE_value=flds[4];
REMOTE_time=flds[5]; }
}
}
{print MONLOCAL_value",MONLOCAL_time,LOCAL_value,LOCAL_time,MONREMOTE_value,MONREMOTE_time,REMOTE_value,REMOTE_time;}
}
}'
The bad part here is that as you can observe below, the output is not as expected since I'm unable to clear the array values after each line is read, or find a solution to charge the array elements dynamically for each line:
4915779000211,62, , ,341993,23/10/2012, , , , ,
4915779000212,11,33222,22/10/2012,341993,23/10/2012,56,18/10/2012, ,
491639000591,14,341993,22/10/2012,341993,23/10/2012, , , ,
491779001768,18,319307448,18/10/2012,19316,22/10/2012,383666,22/10/2012,1180306134,19/10/2012
4915779000213,49,3849259,05/10/2012,19316,22/10/2012,56,18/10/2012,15253,22/10/2012
491779000758,52,9356828,08/10/2012,19316,22/10/2012,56,18/10/2012,1180306134,19/10/2012
4915779000214,44,1865871,22/10/2012,19316,22/10/2012,383666,22/10/2012,1180306134,19/10/2012
491639000221,23,319307448,18/10/2012,19316,22/10/2012,151145,18/10/2012,1180306134,19/10/2012
So, if you guys have an idea how to make this awk function to work and give the expected output, I would be grateful.
Upvotes: 2
Views: 1005
Reputation: 18202
With some modification you can use something similar to the following:
while read line; do
if [[ "$line" == *\$* ]]; then
echo $line | awk -F',|\$' '{print $1,$2,$5,$6,$3}' >> newfile;
else
echo $line | awk -F',' '{print $1,$2,$3,,$4,,$5}' >> newfile;
fi
done < "/path/to/your/file"
Upvotes: 1