Gregory Putman
Gregory Putman

Reputation: 15

Multiple specific text files into CSV in python

I have many text files in a very specific format that need to be read into a csv. I can not seem to figure out how to get all the data in the format that I want in my csv. I can get the file name and the header for the sheet, but none of the data is active in the sheet. The text file is a s follows:

"market":"billing, MI"
"mileStoneUpdates":"N"
"woName":"Dsca_55354_55as0"
"buildStage":"CPD"
"designType":"Core"
"woOverwrite":"Y"

My code:

import os
import csv

dirpath = 'C:\Usersnput\\'
output = 'C:\Users\gputew Microsoft Excel Worksheet.csv'
with open(output, 'w') as outfile:
    csvout = csv.writer(outfile)
    csvout.writerow(['market','mileStoneUpdates','woName','buildStage','designType','woOverwrite'])
files = os.listdir(dirpath)

for filename in files:
    with open(dirpath + '/' + filename) as afile:
        csvout.writerow([filename, afile.read()])
        afile.close()

outfile.close()

Need to have spreadsheet with headers;market, mileStoneUpdates, woName, buildstage, designType, woOverwrite with the cells filled with billings...ect from each text file.

Upvotes: 1

Views: 3007

Answers (4)

Adam Puza
Adam Puza

Reputation: 1529

There are few changes needed:

  • All operations on file need to be inside with clause, and closing is not necessary there.
  • Then you need collect data from file.

Easiest solution will be:

import os
import csv
from collections import OrderedDict

HEADERS = ['market', 'mileStoneUpdates', 'woName', 'buildStage', 'designType', 'woOverwrite']

dirpath = '/tmp/input'
output = '/tmp/output'
with open(output, 'w') as outfile:
    csvout = csv.writer(outfile)
    csvout.writerow(HEADERS)
    files = os.listdir(dirpath)

    for filename in files:
        with open(dirpath + '/' + filename) as afile:
            data = OrderedDict.fromkeys(HEADERS, "")
            for line in afile:
                for header in HEADERS:
                    if line.startswith('"{}"'.format(header)):
                        value = line.split('"{}":"'.format(header)).pop()
                        value = value[:-2]
                        data[header] = value
            csvout.writerow(data.values())
            afile.close()

    outfile.close()

For given input files:

"market":"billing, MI"
"mileStoneUpdates":"N"
"woName":"Dsca_55354_55as0"
"buildStage":"CPD"
"designType":"Core"
"woOverwrite":"Y"

"market":"billing, MI2"
"mileStoneUpdates":"N2"
"woName":"Dsca_55354_55as02"
"buildStage":"CPD2"
"designType":"Cor2e"
"woOverwrite":"Y2"

Will produce:

market,mileStoneUpdates,woName,buildStage,designType,woOverwrite
"billing, MI",N,Dsca_55354_55as0,CPD,Core,Y
"billing, MI2",N2,Dsca_55354_55as02,CPD2,Cor2e,Y2

Note: if data in files is more complicated, use regexp instead of simple string split.

Upvotes: 0

quacodas
quacodas

Reputation: 273

First, a remark about the "with ... as" syntax: This is designed to do all the work about opening and closing files for you, so when you leave the "with ... as" block, your file will automatically be closed. So your line "afile.close" is unnecessary. Also, you won't be able to edit your output file later on because it has already been closed. So bear that in mind.

If you are looking for a solution that doesn't require any additional libraries (depends on how often you do things like this), this should work, if all your files are in exactly the same format:

import os
import csv

dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft 
Excel Worksheet.csv'
outfile = open(output, 'w')
csvout = csv.writer(outfile)
csvout.writerow(['market','mileStoneUpdates','woName','buildStage','designType','woOverwrite'])
files = os.listdir(dirpath)

for filename in files:
    with open(dirpath + '/' + filename) as afile:
        row=[] # list of values we will be constructing
        for line in afile: # loops through the lines in the file one by one
            value = line.split(':')[1].strip('" \n') # I will be explaining this later
            row.append(value) # adds the retrieved value to our row
        csvout.writerow(row)

outfile.close()

Now let's see what happens in the value = ... row: line.split(':') makes a list of strings that are separated by ':'. So '"market":"billing, MI"\n' becomes ['"market"','"billing, MI"\n'] [1] takes the second item of the list (remember, Python is zero-indexed), because we already know the first item (It's the name of the field). .strip(' "\n') removes the specified characters (double-quote, space or newline) from the beginning and end of the string. In a way, it "cleans up" the string so that only the actual value is left over.

Upvotes: 0

Serge Ballesta
Serge Ballesta

Reputation: 148880

You can use the csv module to parse your input file into a dict and a DictWriter to write it back:

import os
import csv

dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft Excel Worksheet.csv'
with open(output, 'w', newline='') as outfile:
    csvout = csv.DictWriter(outfile, fieldnames =
                ['market','mileStoneUpdates','woName',
                 'buildStage','designType','woOverwrite'])
    csvout.writeheader()
    files = os.listdir(dirpath)

    for filename in files:
        with open(dirpath + '/' + filename) as afile:
            csvin = csv.reader(afile, delimiter=':')
            csvout.writerow({ row[0]: row[1] for row in csvin})

Upvotes: 0

kushy
kushy

Reputation: 362

As a general advice: The pandas library is pretty useful for things like this. If I understood your problem correctly, this should basically do it:

import os
import pandas as pd

dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft Excel Worksheet.csv'
csvout = pd.DataFrame()

for filename in files:
    data = pd.read_csv(filename, sep=':', index_col=0, header=None).T
        csvout = csvout.append(data)

csvout.to_csv(output)

For explanation of the code, see this question/answer which explains how to read a transposed text file with pandas.

Upvotes: 2

Related Questions