Maguy IB
Maguy IB

Reputation: 175

sed regex to match multiple fields and values, including quotes

I have a (space-separated) input file with lines such as:

field1=value1 field2="value 2" field3='value 3' field4="value '4'" ...

The number of fields varies depending of the line. In order to process properly such file, I would ideally like to sed it and obtain some tabulated-separated output such as:

field1 (tab) value1 (tab) field2 (tab) value 2 (tab) field3 (tab) value 3 (tab) field4 (tab) value '4'

The furthest I have been so far is with something such as sed "s/\([a-z][a-z]*\)=\(['\"]\{0,1\}\)\(..*?\)\2/\t\1\t\3/g" but way too far from solving my problem. My difficulty is to handle properly the absence or presence of delimiters (quotes) to the values. For the sake of elegance (or geekness), I am sticking to sed, but would also consider an awk alternative.

Thanks in advance for any help,

Edit: I am shocked to say, but @Jotne is right.

echo "field1=value1 field2=\"value 2\" field3='value 3' field4=\"value '4'\"" | sed "s/\([a-z][a-z]*\)=\(\([^ ][^ ]*\)\|'\([^'][^']*\)'\|\"\([^\"][^\"]*\)\"\)/\1\t\3\4\5\t/g"

does not work: field1=value1 field2="value 2" field3='value 3' field4="value '4'"`

Though the following (the idea behind is to parse an audit.log file) works:

root@XXX:~# tail -n 2 /var/log/audit/audit.log 
type=CRED_DISP msg=audit(1570385821.075:670): pid=32605 uid=0 auid=0 ses=399 msg='op=PAM:setcred acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
type=USER_END msg=audit(1570385821.075:671): pid=32605 uid=0 auid=0 ses=399 msg='op=PAM:session_close acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
root@XXX:~# tail -n 2 /var/log/audit/audit.log | sed "s/\([a-z][a-z]*\)=\(\([^ ][^ ]*\)\|'\([^'][^']*\)'\|\"\([^\"][^\"]*\)\"\)/\1\t\3\4\5\t/g"
type    CRED_DISP    msg    audit(1570385821.075:670):   pid    32605    uid    0    auid   0    ses    399  msg    op=PAM:setcred acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success 
type    USER_END     msg    audit(1570385821.075:671):   pid    32605    uid    0    auid   0    ses    399  msg    op=PAM:session_close acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success   

Why?

Upvotes: 5

Views: 281

Answers (2)

catweazle
catweazle

Reputation: 47

Regarding:

Edit: I am shocked to say, but @Jotne is right.

It does not work because of your regexp used in sed:

Do focus on the regexp for the key part in key=value pairs!

It should look like this:

echo "field1=value1 field2=\"value 2\" field3='value 3' field4=\"value '4'\"" | sed "s/\([a-z0-9][a-z0-9]*\)=\(\([^ ][^ ]*\)\|'\([^'][^']*\)'\|\"\([^\"][^\"]*\)\"\)/\1\t\3\4\5\t/g"

In the real data file there were no keys ending in digits so that it had matched!

By the way. @potong 's solution circumvents these details in an elegant way.

Kind regards.

Upvotes: 0

potong
potong

Reputation: 58488

This might work for you (GNU sed):

sed -E 's/ \<([^ =]+)=("[^"]*"|'\''[^'\'']*'\'')/\t\1\t\2/g;s/=/\t/' file

The first substitution replaces all ='s and spaced fields except for the first field. The second substitution rectifies the first.

Upvotes: 1

Related Questions