Reputation: 175
I have a (space-separated) input file with lines such as:
field1=value1 field2="value 2" field3='value 3' field4="value '4'" ...
The number of fields varies depending of the line. In order to process properly such file, I would ideally like to sed
it and obtain some tabulated-separated output such as:
field1 (tab) value1 (tab) field2 (tab) value 2 (tab) field3 (tab) value 3 (tab) field4 (tab) value '4'
The furthest I have been so far is with something such as sed "s/\([a-z][a-z]*\)=\(['\"]\{0,1\}\)\(..*?\)\2/\t\1\t\3/g"
but way too far from solving my problem. My difficulty is to handle properly the absence or presence of delimiters (quotes) to the values. For the sake of elegance (or geekness), I am sticking to sed
, but would also consider an awk
alternative.
Thanks in advance for any help,
Edit: I am shocked to say, but @Jotne is right.
echo "field1=value1 field2=\"value 2\" field3='value 3' field4=\"value '4'\"" | sed "s/\([a-z][a-z]*\)=\(\([^ ][^ ]*\)\|'\([^'][^']*\)'\|\"\([^\"][^\"]*\)\"\)/\1\t\3\4\5\t/g"
does not work: field1=value1 field2="value 2" field3='value 3' field4="value '4'"`
Though the following (the idea behind is to parse an audit.log
file) works:
root@XXX:~# tail -n 2 /var/log/audit/audit.log
type=CRED_DISP msg=audit(1570385821.075:670): pid=32605 uid=0 auid=0 ses=399 msg='op=PAM:setcred acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
type=USER_END msg=audit(1570385821.075:671): pid=32605 uid=0 auid=0 ses=399 msg='op=PAM:session_close acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
root@XXX:~# tail -n 2 /var/log/audit/audit.log | sed "s/\([a-z][a-z]*\)=\(\([^ ][^ ]*\)\|'\([^'][^']*\)'\|\"\([^\"][^\"]*\)\"\)/\1\t\3\4\5\t/g"
type CRED_DISP msg audit(1570385821.075:670): pid 32605 uid 0 auid 0 ses 399 msg op=PAM:setcred acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success
type USER_END msg audit(1570385821.075:671): pid 32605 uid 0 auid 0 ses 399 msg op=PAM:session_close acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success
Why?
Upvotes: 5
Views: 281
Reputation: 47
Regarding:
Edit: I am shocked to say, but @Jotne is right.
It does not work because of your regexp used in sed:
Do focus on the regexp for the key part in key=value pairs!
It should look like this:
echo "field1=value1 field2=\"value 2\" field3='value 3' field4=\"value '4'\"" | sed "s/\([a-z0-9][a-z0-9]*\)=\(\([^ ][^ ]*\)\|'\([^'][^']*\)'\|\"\([^\"][^\"]*\)\"\)/\1\t\3\4\5\t/g"
In the real data file there were no keys ending in digits so that it had matched!
By the way. @potong 's solution circumvents these details in an elegant way.
Kind regards.
Upvotes: 0
Reputation: 58488
This might work for you (GNU sed):
sed -E 's/ \<([^ =]+)=("[^"]*"|'\''[^'\'']*'\'')/\t\1\t\2/g;s/=/\t/' file
The first substitution replaces all =
's and spaced fields except for the first field. The second substitution rectifies the first.
Upvotes: 1