Gargi
Gargi

Reputation: 29

Extract a substring from a string including a comma

I have a list of strings in a file. I am trying to extract a substring from each string and printing them. The strings look like the following -

Box1 is lifted\nInform the manufacturer
Box2 is lifted\nInform the manufacturer
Box3, Box4 is lifted\nInform the manufacturer
Box5, Box6 is lifted\nInform the manufacturer
Box7 is lifted\nInform the manufacturer

From each line I have to extract the string before \n and print them. I used the following Python regex to do that - term = r'.*-\s([\w\s]+)\\n' This regex works fine for the 1st, 2nd and last line. But it doesn't work for the 3rd and 4th lines since there is a , in the string. How should I modify my regex expression to fit in that?

Expected results -

Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted

Results obtained currently -

Box1 is lifted
Box2 is lifted
Box2 is lifted
Box2 is lifted
Box7 is lifted

Upvotes: 1

Views: 679

Answers (7)

Aaditya Ura
Aaditya Ura

Reputation: 12669

You can try regex and can capture the group:

One line solution:

import re
pattern=r'\w.+(?=\\n)'

print([re.search(pattern,line).group() for line in open('file','r')])

output:

['Box1 is lifted', 'Box2 is lifted', 'Box3, Box4 is lifted', 'Box5, Box6 is lifted', 'Box7 is lifted']

Detailed solution:

import re
pattern=r'\w.+(?=\\n)'
with open('newt','r') as f:
    for line in f:
        print(re.search(pattern,line).group())

output:

Box1 is lifted
Box2 is lifted
Box3, Box4 is lifted
Box5, Box6 is lifted
Box7 is lifted

Upvotes: 0

Pulsar
Pulsar

Reputation: 288

regex is overkill for basic string operations like this. Use the built-in string methods, like partition and replace:

for line in lines:
    first, sep, last = line.partition('\n')
    newline = first.replace(',','')
    print (newline)

Edit. In case \n is a literal sequence in a line read from a file, use r'\n' instead of '\n'.

Upvotes: 2

kemparaj565
kemparaj565

Reputation: 385

Let me know if the below works for you.

input="Box3, Box4 is lifted\nInform the manufacturer"
input=input.replace(",","",1)
print(input)
print(input[0:input.index("\n")])
input="Box1 is lifted\nInform the manufacturer"
print(input[0:input.index("\n")])

Upvotes: 0

theBrainyGeek
theBrainyGeek

Reputation: 584

Why not something as simple as term = r"[*]*(is lifted)". Or don't use regex at all if not required. EDIT: I think this might be better term = r"(Box[0-9])?(, Box[0-9])*(is lifted)"

Upvotes: 1

Mateo
Mateo

Reputation: 1971

The comma isn't part of either \W or \s character set.term = r'.*-\s([\w\s,]+)\\n' should do what you want.

Upvotes: 2

Aaron Lael
Aaron Lael

Reputation: 188

If this is a consistent format, you could just split on the newline:

''.join(YOURSTRING.split('\n')[0].split(','))

Edited because I missed the part about removing the comma.

Upvotes: 2

allen-munsch
allen-munsch

Reputation: 54

What about something like this? :

from io import StringIO

ok = '''Box1 is lifted\\nInform the manufacturer
Box2 is lifted\\nInform the manufacturer
Box3, Box4 is lifted\\nInform the manufacturer
Box5, Box6 is lifted\\nInform the manufacturer
Box7 is lifted\\nInform the manufacturer
'''
ok = StringIO(ok)
strings = [' '.join(x.split()).replace('\\n', '').replace(',', '') for x in ok.split('Inform the manufacturer')]
>>> for x in strings: print x
... 
... 
Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted

Upvotes: 1

Related Questions