Reputation: 15
I have many text files in a very specific format that need to be read into a csv. I can not seem to figure out how to get all the data in the format that I want in my csv. I can get the file name and the header for the sheet, but none of the data is active in the sheet. The text file is a s follows:
"market":"billing, MI"
"mileStoneUpdates":"N"
"woName":"Dsca_55354_55as0"
"buildStage":"CPD"
"designType":"Core"
"woOverwrite":"Y"
My code:
import os
import csv
dirpath = 'C:\Usersnput\\'
output = 'C:\Users\gputew Microsoft Excel Worksheet.csv'
with open(output, 'w') as outfile:
csvout = csv.writer(outfile)
csvout.writerow(['market','mileStoneUpdates','woName','buildStage','designType','woOverwrite'])
files = os.listdir(dirpath)
for filename in files:
with open(dirpath + '/' + filename) as afile:
csvout.writerow([filename, afile.read()])
afile.close()
outfile.close()
Need to have spreadsheet with headers;market, mileStoneUpdates, woName, buildstage, designType, woOverwrite with the cells filled with billings...ect from each text file.
Upvotes: 1
Views: 3007
Reputation: 1529
There are few changes needed:
Easiest solution will be:
import os
import csv
from collections import OrderedDict
HEADERS = ['market', 'mileStoneUpdates', 'woName', 'buildStage', 'designType', 'woOverwrite']
dirpath = '/tmp/input'
output = '/tmp/output'
with open(output, 'w') as outfile:
csvout = csv.writer(outfile)
csvout.writerow(HEADERS)
files = os.listdir(dirpath)
for filename in files:
with open(dirpath + '/' + filename) as afile:
data = OrderedDict.fromkeys(HEADERS, "")
for line in afile:
for header in HEADERS:
if line.startswith('"{}"'.format(header)):
value = line.split('"{}":"'.format(header)).pop()
value = value[:-2]
data[header] = value
csvout.writerow(data.values())
afile.close()
outfile.close()
For given input files:
"market":"billing, MI"
"mileStoneUpdates":"N"
"woName":"Dsca_55354_55as0"
"buildStage":"CPD"
"designType":"Core"
"woOverwrite":"Y"
"market":"billing, MI2"
"mileStoneUpdates":"N2"
"woName":"Dsca_55354_55as02"
"buildStage":"CPD2"
"designType":"Cor2e"
"woOverwrite":"Y2"
Will produce:
market,mileStoneUpdates,woName,buildStage,designType,woOverwrite
"billing, MI",N,Dsca_55354_55as0,CPD,Core,Y
"billing, MI2",N2,Dsca_55354_55as02,CPD2,Cor2e,Y2
Note: if data in files is more complicated, use regexp instead of simple string split.
Upvotes: 0
Reputation: 273
First, a remark about the "with ... as" syntax: This is designed to do all the work about opening and closing files for you, so when you leave the "with ... as" block, your file will automatically be closed. So your line "afile.close" is unnecessary. Also, you won't be able to edit your output file later on because it has already been closed. So bear that in mind.
If you are looking for a solution that doesn't require any additional libraries (depends on how often you do things like this), this should work, if all your files are in exactly the same format:
import os
import csv
dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft
Excel Worksheet.csv'
outfile = open(output, 'w')
csvout = csv.writer(outfile)
csvout.writerow(['market','mileStoneUpdates','woName','buildStage','designType','woOverwrite'])
files = os.listdir(dirpath)
for filename in files:
with open(dirpath + '/' + filename) as afile:
row=[] # list of values we will be constructing
for line in afile: # loops through the lines in the file one by one
value = line.split(':')[1].strip('" \n') # I will be explaining this later
row.append(value) # adds the retrieved value to our row
csvout.writerow(row)
outfile.close()
Now let's see what happens in the value = ...
row: line.split(':')
makes a list of strings that are separated by ':'
. So '"market":"billing, MI"\n'
becomes ['"market"','"billing, MI"\n']
[1]
takes the second item of the list (remember, Python is zero-indexed), because we already know the first item (It's the name of the field). .strip(' "\n')
removes the specified characters (double-quote, space or newline) from the beginning and end of the string. In a way, it "cleans up" the string so that only the actual value is left over.
Upvotes: 0
Reputation: 148880
You can use the csv module to parse your input file into a dict and a DictWriter to write it back:
import os
import csv
dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft Excel Worksheet.csv'
with open(output, 'w', newline='') as outfile:
csvout = csv.DictWriter(outfile, fieldnames =
['market','mileStoneUpdates','woName',
'buildStage','designType','woOverwrite'])
csvout.writeheader()
files = os.listdir(dirpath)
for filename in files:
with open(dirpath + '/' + filename) as afile:
csvin = csv.reader(afile, delimiter=':')
csvout.writerow({ row[0]: row[1] for row in csvin})
Upvotes: 0
Reputation: 362
As a general advice: The pandas library is pretty useful for things like this. If I understood your problem correctly, this should basically do it:
import os
import pandas as pd
dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft Excel Worksheet.csv'
csvout = pd.DataFrame()
for filename in files:
data = pd.read_csv(filename, sep=':', index_col=0, header=None).T
csvout = csvout.append(data)
csvout.to_csv(output)
For explanation of the code, see this question/answer which explains how to read a transposed text file with pandas.
Upvotes: 2