Reputation: 285
I have a lines in file as:
keyword = NORTH FACE keyword = GUESS keyword = DRESSES keyword = RALPH LAUREN
My Code is:
keyword=re.findall(r'ke\w+ = \S+',s).
This prints only
NORTH GUESS DRESSES RALPH
But I need regex to handle and print
NORTH FACE GUESS DRESSES RALPH LAUREN
Upvotes: 1
Views: 741
Reputation: 1
D = "keyword = RALPH LAUREN"
m = re.search('(?<== )(\w+\s*)*', D) # search for anything after '= '
m.group(0)
'RALPH LAUREN'
C = "keyword = GUESS"
m.group(0)
'GUESS'
Upvotes: 0
Reputation: 93
msg=fh.read()
output=re.findall("keyword =(.*)",msg)
print (output)
Upvotes: -1
Reputation: 177620
No need for regex. Try partition or split:
lines = '''\
keyword = NORTH FACE
keyword = GUESS
keyword = DRESSES
keyword = RALPH LAUREN
'''.splitlines()
for line in lines:
line.partition(' = ')[2]
print
for line in lines:
print line.split(' = ')[1]
NORTH FACE
GUESS
DRESSES
RALPH LAUREN
NORTH FACE
GUESS
DRESSES
RALPH LAUREN
Given the new information in the comment and making a guess as to the datafile format (Update the question with a REAL example!):
import re
data = '''\
keyword = NORTH FACE
score = 88466
normalizedKeyword = NORTH FACE
keyword = DRESSES
score = 79379
normalizedKeyword = DRESSES
'''
L = re.findall(r'keyword = (.*)\nscore = (.*)\n',data)
for i in L:
print ','.join(i)
NORTH FACE,88466
DRESSES,79379
Upvotes: 1
Reputation:
Not sure if this is what you seek ...
From one of your comments, if you have adjacent lines that you want the values to pair up, but may be surrounded by non-paired lines, you have to do a few things.
Expanded regex:
(?:^|\n) [^\S\n]*
(?:keyword) [^\S\n]* = [^\S\n]* (\w(?:[^\S\n]*\w+)*) [^\S\n]* \n
\s*
(?:score) [^\S\n]* = [^\S\n]* (\w(?:[^\S\n]*\w+)*) [^\S\n]*
(?=\n|$)
Upvotes: 0
Reputation: 35532
Try:
>>> s="""
... keyword = NORTH FACE
... keyword = GUESS
... keyword = DRESSES
... keyword = RALPH LAUREN
... """
>>> re.findall(r'ke\w+ = .*',s)
['keyword = NORTH FACE', 'keyword = GUESS', 'keyword = DRESSES', 'keyword = RALPH LAUREN']
Upvotes: 0
Reputation: 490233
Your regex is consuming non whitespace characters only (\S
). That is why it stops matching when it encounters a space character.
Change that to .*
. This will greedily match all characters except newlines (\n
).
Upvotes: 3
Reputation: 7703
You need to do keyword=re.findall(r'ke\w+ = \S.*',s)
instead of keyword=re.findall(r'ke\w+ = \S+',s)
.
Also, not sure if it serves what you want but following your example you could also do re.split
as following:
>>> s = 'keyword = NORTH FACE'
>>> re.split(' = ', s)
['keyword', 'NORTH FACE']
>>>
Upvotes: 1
Reputation: 236004
Try this:
re.findall(r'ke\w+ = .+$', s)
Or this, to capture only what's after the equals sign:
re.findall(r'ke\w+ = (.+)$', s)
Upvotes: 1