newcane
newcane

Reputation: 285

Regular expression tricks in python

I have a lines in file as:

keyword = NORTH FACE
keyword = GUESS
keyword = DRESSES
keyword = RALPH LAUREN

My Code is:

keyword=re.findall(r'ke\w+ = \S+',s). 

This prints only

NORTH
GUESS
DRESSES
RALPH

But I need regex to handle and print

NORTH FACE
GUESS
DRESSES
RALPH LAUREN

Upvotes: 1

Views: 741

Answers (8)

user2472780
user2472780

Reputation: 1

D = "keyword = RALPH LAUREN"
m = re.search('(?<== )(\w+\s*)*', D)  # search for anything after '= '
m.group(0)
'RALPH LAUREN'
C = "keyword = GUESS"
m.group(0)
'GUESS'

Upvotes: 0

Arohi Gupta
Arohi Gupta

Reputation: 93

msg=fh.read()
output=re.findall("keyword =(.*)",msg)
print (output)

Upvotes: -1

Mark Tolonen
Mark Tolonen

Reputation: 177620

No need for regex. Try partition or split:

lines = '''\
keyword = NORTH FACE
keyword = GUESS
keyword = DRESSES
keyword = RALPH LAUREN
'''.splitlines()

for line in lines:
    line.partition(' = ')[2]
print
for line in lines:
    print line.split(' = ')[1]

Output

NORTH FACE
GUESS
DRESSES
RALPH LAUREN

NORTH FACE
GUESS
DRESSES
RALPH LAUREN

Update

Given the new information in the comment and making a guess as to the datafile format (Update the question with a REAL example!):

import re

data = '''\
keyword = NORTH FACE
score = 88466
normalizedKeyword = NORTH FACE

keyword = DRESSES
score = 79379
normalizedKeyword = DRESSES
'''

L = re.findall(r'keyword = (.*)\nscore = (.*)\n',data)
for i in L:
    print ','.join(i)

Output

NORTH FACE,88466
DRESSES,79379

Upvotes: 1

user557597
user557597

Reputation:

Not sure if this is what you seek ...

From one of your comments, if you have adjacent lines that you want the values to pair up, but may be surrounded by non-paired lines, you have to do a few things.

  1. Read the entire file into a buffer. This is because the paired lines can be anywhere in the file.
  2. Treat the string as a single line.
  3. Globally capture the values. In the below example, capture buffer 1 will be the 'keyword' value, capture buffer 2 will be the 'score' value. The 'keyword' and 'score' are placeholders for the real constants you wish pairs of values to be found for.

Expanded regex:
(?:^|\n) [^\S\n]*
(?:keyword) [^\S\n]* = [^\S\n]* (\w(?:[^\S\n]*\w+)*) [^\S\n]* \n
\s*
(?:score) [^\S\n]* = [^\S\n]* (\w(?:[^\S\n]*\w+)*) [^\S\n]*
(?=\n|$)

Upvotes: 0

the wolf
the wolf

Reputation: 35532

Try:

>>> s="""
... keyword = NORTH FACE
... keyword = GUESS
... keyword = DRESSES
... keyword = RALPH LAUREN
... """
>>> re.findall(r'ke\w+ = .*',s)
['keyword = NORTH FACE', 'keyword = GUESS', 'keyword = DRESSES', 'keyword = RALPH LAUREN']

Upvotes: 0

alex
alex

Reputation: 490233

Your regex is consuming non whitespace characters only (\S). That is why it stops matching when it encounters a space character.

Change that to .*. This will greedily match all characters except newlines (\n).

Upvotes: 3

gsbabil
gsbabil

Reputation: 7703

You need to do keyword=re.findall(r'ke\w+ = \S.*',s) instead of keyword=re.findall(r'ke\w+ = \S+',s).

Also, not sure if it serves what you want but following your example you could also do re.split as following:

>>> s = 'keyword = NORTH FACE'
>>> re.split(' = ', s)
['keyword', 'NORTH FACE']
>>> 

Upvotes: 1

&#211;scar L&#243;pez
&#211;scar L&#243;pez

Reputation: 236004

Try this:

re.findall(r'ke\w+ = .+$', s)

Or this, to capture only what's after the equals sign:

re.findall(r'ke\w+ = (.+)$', s)

Upvotes: 1

Related Questions