Reputation: 29
I have a list of strings in a file. I am trying to extract a substring from each string and printing them. The strings look like the following -
Box1 is lifted\nInform the manufacturer
Box2 is lifted\nInform the manufacturer
Box3, Box4 is lifted\nInform the manufacturer
Box5, Box6 is lifted\nInform the manufacturer
Box7 is lifted\nInform the manufacturer
From each line I have to extract the string before \n
and print them. I used the following Python regex to do that - term = r'.*-\s([\w\s]+)\\n'
This regex works fine for the 1st, 2nd and last line. But it doesn't work for the 3rd and 4th lines since there is a ,
in the string. How should I modify my regex expression to fit in that?
Expected results -
Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted
Results obtained currently -
Box1 is lifted
Box2 is lifted
Box2 is lifted
Box2 is lifted
Box7 is lifted
Upvotes: 1
Views: 679
Reputation: 12669
You can try regex and can capture the group:
One line solution:
import re
pattern=r'\w.+(?=\\n)'
print([re.search(pattern,line).group() for line in open('file','r')])
output:
['Box1 is lifted', 'Box2 is lifted', 'Box3, Box4 is lifted', 'Box5, Box6 is lifted', 'Box7 is lifted']
Detailed solution:
import re
pattern=r'\w.+(?=\\n)'
with open('newt','r') as f:
for line in f:
print(re.search(pattern,line).group())
output:
Box1 is lifted
Box2 is lifted
Box3, Box4 is lifted
Box5, Box6 is lifted
Box7 is lifted
Upvotes: 0
Reputation: 288
regex is overkill for basic string operations like this. Use the built-in string methods, like partition and replace:
for line in lines:
first, sep, last = line.partition('\n')
newline = first.replace(',','')
print (newline)
Edit. In case \n is a literal sequence in a line read from a file, use r'\n' instead of '\n'.
Upvotes: 2
Reputation: 385
Let me know if the below works for you.
input="Box3, Box4 is lifted\nInform the manufacturer"
input=input.replace(",","",1)
print(input)
print(input[0:input.index("\n")])
input="Box1 is lifted\nInform the manufacturer"
print(input[0:input.index("\n")])
Upvotes: 0
Reputation: 584
Why not something as simple as term = r"[*]*(is lifted)"
. Or don't use regex at all if not required.
EDIT: I think this might be better term = r"(Box[0-9])?(, Box[0-9])*(is lifted)"
Upvotes: 1
Reputation: 1971
The comma isn't part of either \W or \s character set.term = r'.*-\s([\w\s,]+)\\n'
should do what you want.
Upvotes: 2
Reputation: 188
If this is a consistent format, you could just split on the newline:
''.join(YOURSTRING.split('\n')[0].split(','))
Edited because I missed the part about removing the comma.
Upvotes: 2
Reputation: 54
What about something like this? :
from io import StringIO
ok = '''Box1 is lifted\\nInform the manufacturer
Box2 is lifted\\nInform the manufacturer
Box3, Box4 is lifted\\nInform the manufacturer
Box5, Box6 is lifted\\nInform the manufacturer
Box7 is lifted\\nInform the manufacturer
'''
ok = StringIO(ok)
strings = [' '.join(x.split()).replace('\\n', '').replace(',', '') for x in ok.split('Inform the manufacturer')]
>>> for x in strings: print x
...
...
Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted
Upvotes: 1