Reputation: 117

How to get matched word from regex match object after using finditer

I have made this pattern to get the url link of the blog post (which can be separated by hyphens or underscores etc in my websites url to match it with the database and display the corresponding post). Whenever I append the matches to a list, all of them are re match objects. How do I obtain the matched word?

I have tried using search and match but those do not return separate word.

import re
pattern = r"[a-zA-Z0-9]+[^-]+"
matches = re.finditer(pattern, "this-is-a-sample-post")
matches_lst = [i for i in matches]

So suppose I have the string "this-is-a-sample-post", I want to get "this is a sample post".

I want a list of the matched words so that I can use the " ".join() method and match the string with my database.

Upvotes: 1

Answers (5)

NPC

Reputation: 90

From the current regular expression pattern(r"[a-zA-Z0-9]+[^-]+"), it will fetch only "this is sample post" and missing the "a". because here it is looking for one or more characters.

To get the complete sentence change the pattern to

r'[a-zA-Z0-9]*[^-]'

You can do it 3 ways:

Using the re.sub to replace the "-" with " "(space)

>>> re.sub("-", " ", "this-is-a-sample-post")

O/P: 'this is a sample post'

Fetch the output of finditer() into a list and do the join.

>>> text = "this-is-a-sample-post"
>>> a = [m.group(0) for m in re.finditer(r'[a-zA-Z0-9]*[^-]', text)]
>>> " ".join(a)

o/p: 'this is a sample post'

Fetch the output into a string and replace the '-' with space

str = "this-is-a-sample-post"
str.replace('-', ' ')

o/p:'this is a sample post'

Upvotes: 0

PythonProgrammi

Reputation: 23443

import re
pattern = r"[a-zA-Z0-9]+[^-]+"
string = "this-is-a-sample-post"
matches = re.finditer(pattern, string)
matches_lst = [i.group(0) for i in matches]
print("Made with finditer:")
print(matches_lst)
print("Made with findall")
matches_lst = re.findall(pattern, string)
print(matches_lst)
print("Made with split")
print(string.split("-"))
print("Made with replace and split")
print(string.replace("-"," ").split())

Output: >>>

Made with finditer:
['this', 'is', 'sample', 'post']
Made with findall
['this', 'is', 'sample', 'post']
Made with split
['this', 'is', 'a', 'sample', 'post']
Made with replace and split
['this', 'is', 'a', 'sample', 'post']
>>>

Upvotes: 2

knh190

Reputation: 2882

As suggested in comment, also re.sub is a solution:

import re

s = 'this-is-example'
s = sub('-', ' ', s)

Naive str.replace works too:

s = 'this-is-example'
s = s.replace('-', ' ')

Upvotes: 0

Emma

Reputation: 27723

My guess is that we might also want to slightly modify our expression in the question, if we wish to capture the words and not the dashes:

Demo

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([a-zA-Z0-9]+)"

test_str = "this-is-a-sample-post"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        
        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Upvotes: 0

Chi

Reputation: 322

Replace:

matches_lst = [i for i in matches]

With:

matches_lst = [i.group(0) for i in matches]

Or you could just use findall which will give you a list:

matches = re.findall(pattern, "this-is-a-sample-post")

Upvotes: 2

How to get matched word from regex match object after using finditer

Answers (5)

Demo

Test

Related Questions