dangerChihuahua007
dangerChihuahua007

Reputation: 20925

How do I split these strings into arrays of strings?

I have several strings with phrases or words separated by multiple spaces.

c1 = "St. Louis       12             Cardinals"
c2 = "Boston          16             Red Sox"
c3 = "New York        13             Yankees"

How do I write a function perhaps using the python split(" ") function to separate each line into an array of strings? For instance, c1 would go to ['St. Louis', '12', 'Cardinals'].

Calling split(" ") and then trimming the component entities won't work because some entities such as St. Louis or Red Sox have spaces in them.

However, I do know that all entities are at least 2 spaces apart and that no entity has 2 spaces within it. By the way, I actually have around 100 cities to deal with, not 3. Thanks!

Upvotes: 0

Views: 145

Answers (5)

Austin Marshall
Austin Marshall

Reputation: 3107

It looks like that content is fixed-width. If that is always the case and assuming those are spaces and not tabs, then you can always reverse it using slices:

split_fields = lambda s: [s[:16].strip(), s[16:31:].strip(), s[31:].strip()]

or:

def split_fields(s):
    return [s[:16].strip(), s[16:31:].strip(), s[31:].strip()]

Example usage:

>>> split_fields(c1)
['St. Louis', '12', 'Cardinals']
>>> split_fields(c2)
['Boston', '16', 'Red Sox']
>>> split_fields(c3)
['New York', '13', 'Yankees']

Upvotes: 2

eumiro
eumiro

Reputation: 213075

Without regular expressions:

c1 = "St. Louis       12             Cardinals"
words = [w.strip() for w in c1.split('  ') if w]
# words == ['St. Louis', '12', 'Cardinals']

Upvotes: 4

synthesizerpatel
synthesizerpatel

Reputation: 28056

You could do this with regular expressions:

import re

blahRegex = re.compile(r'(.*?)\s+(\d+)\s+(.*?)')

for line in open('filename','ro').readlines():
    m = blahRegex.match(line)
    if m is not None:
         city = m.group(1)
         rank = m.group(2)
         team = m.group(3)

There's a lot of ways to skin that cat, you could use named groups, or make your regular expression tighter.. But, this should do it.

Upvotes: 2

mpen
mpen

Reputation: 283313

You can use re.split

>>> re.split('\s{2,}','St. Louis       12             Cardinals')
['St. Louis', '12', 'Cardinals']

Upvotes: 2

Ade YU
Ade YU

Reputation: 2362

import re
re.split(r' {2,}', c1)
re.split(r' {2,}', c2)
re.split(r' {2,}', c3)

Upvotes: 3

Related Questions