Reputation: 3862
A file like this:
fieldname_A: eggs
fieldname_B: bacon
~EOR~
..
..
~EOR~
fieldname_A: eggs
fieldname_C: spam
fieldname_Z: baked beans
~EOR~
in total, 40 different field names, about 10,000 records (EOR is End Of Record) with a variable set of fields in them. Most field content is short, some is very long (over 1000 characters).
I'm looking for a nice, clean, pythonic way to transform this to CSV, or rather, to a Google spreadsheet. (The data needs to be checked and corrected. Several people will do this. A google doc is the only fast way I know for them to work on the same data without causing version conflicts or syncing problems.)
Dictwriter seems like a cumbersome way, because many records (ie rows in a the spreadsheet) have fields in them that are empty.
The form of the input file is more XML-ish, so I'm thinking to transform it to XML, import that into Excel, and then upload it as Google spreadsheet.
Any ideas?
Upvotes: 0
Views: 66
Reputation: 35950
Use this one...
infile = "d.txt"
data = open(infile, "r").readlines()
dataDict = []
columns = []
rowData = {}
# Create a dictionary list
for line in data:
if not line.strip():
continue
if "~EOR~" in line:
keys = rowData.keys()
dataDict.append(rowData)
columns = list(set(columns) | set(keys))
rowData = {}
else:
cell = line.split(":", 1);
print cell
rowData[cell[0].strip().strip('"').strip("'")] = cell[1].strip().strip('"').strip("'")
# Write dictionary list to file
outfile = "d.csv"
with open(outfile, "w") as fp:
columns.sort()
for key in columns:
fp.write(key + ", ")
fp.write("\n")
for data in dataDict:
for key in columns:
if key in data:
fp.write(data[key] + ",")
else:
fp.write(",")
fp.write("\n")
Input:
fieldname_A: eggs
fieldname_B: bacon
~EOR~
fieldname_A: eggs
fieldname_C: spam
fieldname_Z: baked beans
~EOR~
Output:
fieldname_A, fieldname_B, fieldname_C, fieldname_Z,
eggs,bacon,,,
eggs,,spam,baked beans,
Upvotes: 1