Reputation: 20925
I have several strings with phrases or words separated by multiple spaces.
c1 = "St. Louis 12 Cardinals"
c2 = "Boston 16 Red Sox"
c3 = "New York 13 Yankees"
How do I write a function perhaps using the python split(" ")
function to separate each line into an array of strings? For instance, c1 would go to ['St. Louis', '12', 'Cardinals']
.
Calling split(" ")
and then trimming the component entities won't work because some entities such as St. Louis or Red Sox have spaces in them.
However, I do know that all entities are at least 2 spaces apart and that no entity has 2 spaces within it. By the way, I actually have around 100 cities to deal with, not 3. Thanks!
Upvotes: 0
Views: 145
Reputation: 3107
It looks like that content is fixed-width. If that is always the case and assuming those are spaces and not tabs, then you can always reverse it using slices:
split_fields = lambda s: [s[:16].strip(), s[16:31:].strip(), s[31:].strip()]
or:
def split_fields(s):
return [s[:16].strip(), s[16:31:].strip(), s[31:].strip()]
Example usage:
>>> split_fields(c1)
['St. Louis', '12', 'Cardinals']
>>> split_fields(c2)
['Boston', '16', 'Red Sox']
>>> split_fields(c3)
['New York', '13', 'Yankees']
Upvotes: 2
Reputation: 213075
Without regular expressions:
c1 = "St. Louis 12 Cardinals"
words = [w.strip() for w in c1.split(' ') if w]
# words == ['St. Louis', '12', 'Cardinals']
Upvotes: 4
Reputation: 28056
You could do this with regular expressions:
import re
blahRegex = re.compile(r'(.*?)\s+(\d+)\s+(.*?)')
for line in open('filename','ro').readlines():
m = blahRegex.match(line)
if m is not None:
city = m.group(1)
rank = m.group(2)
team = m.group(3)
There's a lot of ways to skin that cat, you could use named groups
, or make your regular expression tighter.. But, this should do it.
Upvotes: 2
Reputation: 283313
You can use re.split
>>> re.split('\s{2,}','St. Louis 12 Cardinals')
['St. Louis', '12', 'Cardinals']
Upvotes: 2
Reputation: 2362
import re
re.split(r' {2,}', c1)
re.split(r' {2,}', c2)
re.split(r' {2,}', c3)
Upvotes: 3