awk clear array values after reading each line of csv

Question

The problem I have is related to a previous post. Now what I have right now is:

One 9-million-rows CSV file containing fields arranged randomly like:
```
192.168.12.23,62,LOCAL,341993,23/10/2012
192.168.12.25,11,MONLOCAL$MONREMOTE,33222$56,22/10/2012$18/10/2012
192.168.12.678,14,MONLOCAL,341993,22/10/2012

192.168.12.83,18, , ,
192.168.12.21,49,LOCAL$REMOTE,19316$15253,22/10/2012$22/10/2012
192.168.12.79,52,REMOTE,1180306134,19/10/2012
192.168.12.41,44,MONLOCAL$MONREMOTE,1865871$383666,22/10/2012$22/10/2012
192.168.12.29,23,MONREMOTE,151145,18/10/2012
```
Where as you probably already observed, there are 2 field separators, , and $:
- field 1 = IP Address
- field 2 = An unique ID
- field 3 = connection that IP has used (e.g LOCAL, REMOTE, MONLOCAL, or MONREMOTE)
- field 4 = can be also the connection IP has used (e.g LOCAL, REMOTE, MONLOCAL, or MONREMOTE) or the value related to field 3
- field 5 = if field 4 is LOCAL or REMOTE or MONLOCAL or MONREMOTE, then field 5 represent the value of field 3 else it represent the timestamp related to field 3
- field 6 = if field 4 is LOCAL or REMOTE or MONLOCAL or MONREMOTE, then field 6 represent the value of field 4 else it will not exist.
- field 7 = if field 4 is LOCAL or REMOTE or MONLOCAL or MONREMOTE, then field 7 represent the timestamp of field 3 else it will not exist.
- field 8 = if field 4 is LOCAL or REMOTE or MONLOCAL or MONREMOTE, then field 7 represent the timestamp of field 4 else it will not exist.

I need to process the file above and output the following format, having the fields always in following order (10 fields):

IP,ID,MONLOCAL_value,MONLOCAL_timestamp,LOCAL_value,LOCAL_timestamp,MONREMOTE_value,MONREMOTE_timestamp,REMOTE_value,REMOTE_timestamp like:

192.168.12.23,62, , ,341993,23/10/2012, , , , 
192.168.12.25,11,33222,22/10/2012, , , , ,56,18/10/2012
192.168.12.678,14,341993,22/10/2012, , , , , ,  
192.168.12.83,18, , , , , , , , 
192.168.12.21,49, , ,19316,22/10/2012, , ,15253,22/10/2012
192.168.12.79,52, , , , , , ,1180306134,19/10/2012
192.168.12.41,44,1865871,22/10/2012, , ,383666,22/10/2012, , 
192.168.12.29,23, , , , ,151145,18/10/2012, ,

I have the following script to process the file from above:

nawk 'BEGIN {
    while (getline < "'"$data"'" > 0)
    {
    {FS = "[,,$]"; OFS=","}
    split($0,flds)
       {if ($4 ~ /LOCAL|REMOTE|MONLOCAL|MONREMOTE/) {
            if ($3 ~ /MONLOCAL/) {
                    MONREMOTE_time=flds[8];
                    MONREMOTE_value=flds[6];
                    MONLOCAL_time=flds[7];
                    MONLOCAL_value=flds[5]; }
            if ($3 ~ /MONREMOTE/) {
                    MONREMOTE_time=flds[7];
                    MONREMOTE_value=flds[5];
                    REMOTE_value=flds[6];
                    REMOTE_time=flds[8]; }
            if ($3 ~ /REMOTE/) {
                    REMOTE_value=flds[5];
                    REMOTE_time=flds[7];
                    LOCAL_value=flds[6];
                    LOCAL_time=flds[8]; }
    } else {
            if($3 ~ /MONLOCAL/) {
                    MONLOCAL_value=flds[4];
                    MONLOCAL_time=flds[5]; }
            if ($3 ~ /MONREMOTE/) {
                    MONREMOTE_value=flds[4];
                    MONREMOTE_time=flds[5]; }
            if ($3 ~ /LOCAL/) {
                    LOCAL_value=flds[4];
                    LOCAL_time=flds[5]; }
            if ($3 ~ /REMOTE/) {
                    REMOTE_value=flds[4];
                    REMOTE_time=flds[5]; }
    }
  }
  {print MONLOCAL_value",MONLOCAL_time,LOCAL_value,LOCAL_time,MONREMOTE_value,MONREMOTE_time,REMOTE_value,REMOTE_time;}
 }
}'

The bad part here is that as you can observe below, the output is not as expected since I'm unable to clear the array values after each line is read, or find a solution to charge the array elements dynamically for each line:

4915779000211,62, , ,341993,23/10/2012, , , , ,
4915779000212,11,33222,22/10/2012,341993,23/10/2012,56,18/10/2012, , 
491639000591,14,341993,22/10/2012,341993,23/10/2012, , , ,  
491779001768,18,319307448,18/10/2012,19316,22/10/2012,383666,22/10/2012,1180306134,19/10/2012
4915779000213,49,3849259,05/10/2012,19316,22/10/2012,56,18/10/2012,15253,22/10/2012
491779000758,52,9356828,08/10/2012,19316,22/10/2012,56,18/10/2012,1180306134,19/10/2012
4915779000214,44,1865871,22/10/2012,19316,22/10/2012,383666,22/10/2012,1180306134,19/10/2012
491639000221,23,319307448,18/10/2012,19316,22/10/2012,151145,18/10/2012,1180306134,19/10/2012

So, if you guys have an idea how to make this awk function to work and give the expected output, I would be grateful.

awk clear array values after reading each line of csv

Answers (1)

Related Questions