Reputation: 19938
I've a text file with information in this format.
%%%
key1 = value1
key2 = value2
key3 = subkey1:subvalue1;subkey2:subvalue2
%%%
key1 = value1
key2 = value2
key3 = subkey1:subvalue1;subkey2:subvalue2
%%%
I want to convert this to this CSV like format:
key1,key2,key3_subkey1,key3_subkey2
value1,value2,subvalue1,subvalue2
value1,value2,subvalue1,subvalue2
What is the best way to do this. I'm hoping there are unix utilities like Awk/Sed/Grep that can be used instead of write a python/perl program that reads each line, maintains state and converts to csv format
Upvotes: 0
Views: 60
Reputation: 204015
$ cat tst.awk
BEGIN {
FS = "[[:space:]]*=[[:space:]]*"
OFS = ","
}
!/%%%/ {
hdrs = hdrs sep $1
n = split($2,subFlds,/[:;]/)
if ( n == 1 ) {
vals = vals sep $2
sep = OFS
}
else {
for ( i=1; i<=n; i+=2) {
hdrs = hdrs sep subFlds[i]
vals = vals sep subFlds[i+1]
}
if ( !doneHdr++) {
print hdrs
}
print vals
hdrs = vals = sep = ""
}
}
$ awk -f tst.awk file
key1,key2,key3,subkey1,subkey2
value1,value2,subvalue1,subvalue2
value1,value2,subvalue1,subvalue2
Upvotes: 1
Reputation: 19810
I'm not sure how much easier you could get than doing a quick manual parse on this. The code below will even handle any set of keys and arbitrary keys with subkeys thanks to Pandas' building the table.
data = []
for line in open('input.txt'):
if line.startswith('%%%'):
o = {}
data.append(o)
continue
key, value = line.strip().split(' = ')
if ':' in value:
for pairstring in value.split(';'):
subkey, subvalue = pairstring.split(':')
o[f'{key}_{subkey}'] = subvalue
else:
o[key] = value
import pandas
pandas.DataFrame.from_records(data).to_csv('output.csv')
Upvotes: 0