UpstairsDownstairs
UpstairsDownstairs

Reputation: 25

Parsing and reformatting CSV/text data using Python

sorry if this a bit of a beginner's question, but I haven't had much experience with python, and could really use some help in figuring this out. If there is a better programming language for tackling this, I'd be more than open to hearing it

I'm working on a small project, and I have two blocks of data, formatted differently from each other. They're all spreadsheets saved as CSV files, and I'd really like to make one group match the other without having to manually edit all the data.

What I need to do is go through a CSV, and format any data saved like this:

10W

20E

15-16N

17-18S

To a format like this (respective line to respective format):

10,W

20,E

,,15,16,N

,,17,18,S

So that they can just be copied over when opened as spreadsheets

I'm able to get the files into a string in python, but I'm unsure of how to properly write something to search for a number-hyphen-number-letter format.

I'd be immensely grateful for any help I can get. Thanks

Upvotes: 1

Views: 490

Answers (3)

daedalus
daedalus

Reputation: 10923

This is a whole solution that uses regexs. @senderle has beat me to the answer, so feel free to tick his response. This is just added here as I know how difficult it was to wrap my head around re in my code at first.

import re

dash = re.compile('(\d{2})-(\d{2})([WENS])')
no_dash = re.compile( '(\d{2})([WENS])' )

raw = '''10W
20E
15-16N
17-18S'''
lines = raw.split('\n')

data = []

for l in lines:
    if '-' in l:
        match = re.search(dash, l).groups()
        data.append( ',,%s,%s,%s' % (match[0], match[1], match[2] ) )
    else:
        match = re.search(no_dash, l).groups()
        data.append( '%s,%s' % (match[0], match[1] ) )

print '\n'.join(data)

Upvotes: 1

senderle
senderle

Reputation: 150987

This sounds like a good use-case for regular expressions. Once you've split the lines up into individual strings and stripped the whitespace (using s.strip()) these should work (I'm assuming those are cardinal directions; you'll need to change [NESW] to something else if that assumption is incorrect.):

>>> import re
>>> re.findall('\A(\d+)([NESW])', '16N')
[('16', 'N')]
>>> re.findall('\A(\d+)([NESW])', '15-16N')
[]
>>> re.findall('\A(\d+)-(\d+)([NESW])', '15-16N')
[('15', '16', 'N')]
>>> re.findall('\A(\d+)-(\d+)([NESW])', '16N')
[]

The first regex '\A(\d+)([NESW])' matches only a string that begins with a sequence of digits followed by a capital letter N, E, S, or W. The second matches only a string that begins with a sequence of digits followed by a hyphen, followed by another sequence of digits, followed by a capital letter N, E, S, or W. Forcing it to match at the beginning ensures that these regexes don't match a suffix of a longer string.

Then you can do something like this:

>>> vals = re.findall('\A(\d+)([NESW])', '16N')[0]
>>> ','.join(vals)
'16,N'
>>> vals = re.findall('(\d+)-(\d+)([NESW])', '15-16N')[0]
>>> ',,' + ','.join(vals)
',,15,16,N'

Upvotes: 1

Scharron
Scharron

Reputation: 17757

In your case, I think the quick solution would involve regexps

You can either use the match method to extract your different tokens when they match a given regular expression, or the split method to split your string into tokens given a separator.

However, in your case, the separator would be a single character, so you can use the split method from the str class.

Upvotes: 0

Related Questions