James L
James L

Reputation: 348

Splitting a string in python after a number

I am very new to python and mostly new to programming. I have been attempting to parse certain .txt files into excel, and have had success with a number of them that were easy to split into lines that I could code around.

However, I now have a bunch of files that have my information, but with no reasonable line breaks. My data looks like this:

company1 name _______ 123   company2 name 456 company3 name 
789

with no good indicators between names and numbers--sometimes there are underscores between, sometimes only whitespace, sometimes there's a line break in between. If I could separate all of this into lines that ended after each full number, then the code I've already written would do the rest. Ideally, I'd have a string that looks like:

company1 name ______ 123
company2 name 456
company3 name 789

with the line breaks in the original string parsed out.

I hope someone can help!

Upvotes: 2

Views: 146

Answers (3)

vks
vks

Reputation: 67968

import re
p = re.compile(r'(\b\d+)\s+')
test_str = "company1 name _______ 123   company2 name 456 company3 name 789"
subst = "\1\n"

result = re.sub(p, subst, test_str)

You can do it using re.sub.

Upvotes: 0

kponz
kponz

Reputation: 528

Try using a split then checking the type of each element to see if it's a number:

new_string = ''
data_string = data_string.replace('\n','')
data_array = data_string.split(' ')
for portion in data_array:
    if type(portion) in [int, float]:
        new_string = new_string + portion + '\n'
    else:
        new_string = new_string + portion + ' '

Upvotes: 0

Martin Konecny
Martin Konecny

Reputation: 59611

You should probably use a Regular Expression for this which looks for patterns in text, and allows you to modify that pattern with a newline.

For example:

import re
line = 'company1 name _______ 123   company2 name 456 company3 name 789'
output = re.sub(r'(\s\d+\s*)', r'\1\n', line)
print output

which returns

company1 name _______ 123   
company2 name 456 
company3 name 789

Upvotes: 2

Related Questions