Patrick McGranaghan
Patrick McGranaghan

Reputation: 121

Parsing a text file with line breaks in python

I have a text file with about 20 entries. They look like this:

~

England
Link: http://imgur.com/foobar.jpg
Capital: London
~
Iceland
Link: http://imgur.com/foobar2.jpg
Capital: Reykjavik
...

etc.

I would like to take these entries and turn them into a CSV. There is a '~' separating each entry. I'm scratching my head trying to figure out how to go thru line by line and create the CSV values for each country. Can anyone give me a clue on how to go about this?

Upvotes: 2

Views: 1908

Answers (2)

Badri
Badri

Reputation: 2262

Use the libraries luke :) I'm assuming your data is well formatted. Most real world data isn't that way. So, here goes a solution.

>>> content.split('~')
['\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n', '\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n']

For writing the CSV, Python has standard library functions.

>>> import csv
>>> csvfile = open('foo.csv', 'wb')
>>> fieldnames = ['Country', 'Link', 'Capital']
>>> writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
>>> for entry in entries:
...   cols = entry.strip().splitlines()
...   writer.writerow({'Country': cols[0], 'Link':cols[1].split(': ')[1], 'Capital':cols[2].split(':')[1]})
... 

If your data is more semi structured or badly formatted, consider using a library like PyParsing.

Edit: Second column contains URLs, so we need to handle the splits well.

>>> cols[1]
'Link: http://imgur.com/foobar2.jpg'
>>> cols[1].split(':')[1]
' http'
>>> cols[1].split(': ')[1]
'http://imgur.com/foobar2.jpg'

Upvotes: 3

Dynn__
Dynn__

Reputation: 3

The way that I would do that would be to use the open() function using the syntax of:

f = open('NameOfFile.extensionType', 'a+')

Where "a+" is append mode. The file will not be overwritten and new data can be appended. You could also use "r+" to open the file in read mode, but would lose the ability to edit. The "+" after a letter signifies that if the document does not exist, it will be created. The "a+" I've never found to work without the "+".

After that I would use a for loop like this:

data = []
tmp = []
for line in f:
  line.strip() #Removes formatting marks made by python
  if line == '~':
    data.append(tmp)
    tmp = []
    continue
  else:
    tmp.append(line)

Now you have all of the data stored in a list, but you could also reformat it as a class object using a slightly different algorithm.

I have never edited CSV files using python, but I believe you can use a loop like this to add the data:

f2 = open('CSVfileName.csv', 'w') #Can change "w" for other needs i.e "a+"
for entry in data:
  for subentry in entry:
    f2.write(str(subentry) + '\n') #Use '\n' to create a new line

From my knowledge of CSV that loop would create a single column of all of the data. At the end remember to close the files in order to save the changes:

f.close()
f2.close()

You could combine the two loops into one in order to save space, but for the sake of explanation I have not.

Upvotes: 0

Related Questions