user7264830
user7264830

Reputation: 13

Why does the gsub function have to be used before match function in this instance?

I am parsing an auth.log file which looks like this:

Dec 4 00:22:36 ip-172-31-23-55 sshd[3033]: Connection from 85.93.5.70 port 50208 on 172.31.23.55 port 22

into CSV output which looks like this:

Dec, 4, 00 22 36, 85.93.5.69

The following awk code correctly produces what I want, just as it is in the example above:

BEGIN {
  OFS=", "
} 
{
  gsub(/:/, " ", $3)
  match($0, /([1-9]{1,3}[\.]){3}[0-9]{1,3}/) 
  print $1, $2, $3, substr($0, RSTART, RLENGTH) 
}

However, if I flip the gsub and match functions, as in

BEGIN {
  OFS=", "
} 
{
  match($0, /([1-9]{1,3}[\.]){3}[0-9]{1,3}/)
  gsub(/:/, " ", $3)
  print $1, $2, $3, substr($0, RSTART, RLENGTH) 
}

I get the following output:

Dec, 4, 00 22 36,  from, 85.

which makes very little sense. gsub is only looking for ":", and the match finds an IP address. The same behavior is observed when using sub instead of gsub.

I am very confused as to how these functions are stepping on each others toes.

Upvotes: 1

Views: 91

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133518

could you please try following and let me know if this helps you.

echo "Your_Input_as mentioned_above" | awk --re-interval '{split($3, A,":");match($0,/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/);print $1, $2, A[1], A[2], A[3],substr($0,RSTART,RLENGTH)}'

Hope this helps. Also I have tested this in GNU awk.

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203532

When you modify $3 awk recompiles the record replacing each blank char by your 2-character OFS value and so $0 is no longer the same length as it was before so the RSTART value from before you recompiled the record is not pointing to the intended character in the newly compiled record. Look:

$ awk 'BEGIN { OFS=", " } { print; gsub(/:/, " ", $3); print }' file
Dec 4 00:22:36 ip-172-31-23-55 sshd[3033]: Connection from 85.93.5.70 port 50208 on 172.31.23.55 port 22
Dec, 4, 00 22 36, ip-172-31-23-55, sshd[3033]:, Connection, from, 85.93.5.70, port, 50208, on, 172.31.23.55, port, 22
                                                           ^
                                                      NOTE |

Notice that where 85.93.5.70 started when you did the match(), now from, starts at that same character count (i.e. the value saved in RSTART).

I highly recommend the book Effective Awk Programming, 4th Edition, by Arnold Robbins.

Upvotes: 1

Related Questions