Reputation: 348
I am very new to python and mostly new to programming. I have been attempting to parse certain .txt files into excel, and have had success with a number of them that were easy to split into lines that I could code around.
However, I now have a bunch of files that have my information, but with no reasonable line breaks. My data looks like this:
company1 name _______ 123 company2 name 456 company3 name
789
with no good indicators between names and numbers--sometimes there are underscores between, sometimes only whitespace, sometimes there's a line break in between. If I could separate all of this into lines that ended after each full number, then the code I've already written would do the rest. Ideally, I'd have a string that looks like:
company1 name ______ 123
company2 name 456
company3 name 789
with the line breaks in the original string parsed out.
I hope someone can help!
Upvotes: 2
Views: 146
Reputation: 67968
import re
p = re.compile(r'(\b\d+)\s+')
test_str = "company1 name _______ 123 company2 name 456 company3 name 789"
subst = "\1\n"
result = re.sub(p, subst, test_str)
You can do it using re.sub
.
Upvotes: 0
Reputation: 528
Try using a split then checking the type of each element to see if it's a number:
new_string = ''
data_string = data_string.replace('\n','')
data_array = data_string.split(' ')
for portion in data_array:
if type(portion) in [int, float]:
new_string = new_string + portion + '\n'
else:
new_string = new_string + portion + ' '
Upvotes: 0
Reputation: 59611
You should probably use a Regular Expression for this which looks for patterns in text, and allows you to modify that pattern with a newline.
For example:
import re
line = 'company1 name _______ 123 company2 name 456 company3 name 789'
output = re.sub(r'(\s\d+\s*)', r'\1\n', line)
print output
which returns
company1 name _______ 123
company2 name 456
company3 name 789
Upvotes: 2