Reputation: 117
I have made this pattern to get the url link of the blog post (which can be separated by hyphens or underscores etc in my websites url to match it with the database and display the corresponding post). Whenever I append the matches to a list, all of them are re match objects. How do I obtain the matched word?
I have tried using search and match but those do not return separate word.
import re
pattern = r"[a-zA-Z0-9]+[^-]+"
matches = re.finditer(pattern, "this-is-a-sample-post")
matches_lst = [i for i in matches]
So suppose I have the string "this-is-a-sample-post", I want to get "this is a sample post".
I want a list of the matched words so that I can use the " ".join() method and match the string with my database.
Upvotes: 1
Views: 5921
Reputation: 90
From the current regular expression pattern(r"[a-zA-Z0-9]+[^-]+"), it will fetch only "this is sample post" and missing the "a". because here it is looking for one or more characters.
To get the complete sentence change the pattern to
r'[a-zA-Z0-9]*[^-]'
You can do it 3 ways:
>>> re.sub("-", " ", "this-is-a-sample-post")
O/P: 'this is a sample post'
>>> text = "this-is-a-sample-post"
>>> a = [m.group(0) for m in re.finditer(r'[a-zA-Z0-9]*[^-]', text)]
>>> " ".join(a)
o/p: 'this is a sample post'
str = "this-is-a-sample-post"
str.replace('-', ' ')
o/p:'this is a sample post'
Upvotes: 0
Reputation: 23443
import re
pattern = r"[a-zA-Z0-9]+[^-]+"
string = "this-is-a-sample-post"
matches = re.finditer(pattern, string)
matches_lst = [i.group(0) for i in matches]
print("Made with finditer:")
print(matches_lst)
print("Made with findall")
matches_lst = re.findall(pattern, string)
print(matches_lst)
print("Made with split")
print(string.split("-"))
print("Made with replace and split")
print(string.replace("-"," ").split())
Output: >>>
Made with finditer:
['this', 'is', 'sample', 'post']
Made with findall
['this', 'is', 'sample', 'post']
Made with split
['this', 'is', 'a', 'sample', 'post']
Made with replace and split
['this', 'is', 'a', 'sample', 'post']
>>>
Upvotes: 2
Reputation: 2882
As suggested in comment, also re.sub
is a solution:
import re
s = 'this-is-example'
s = sub('-', ' ', s)
Naive str.replace
works too:
s = 'this-is-example'
s = s.replace('-', ' ')
Upvotes: 0
Reputation: 27723
My guess is that we might also want to slightly modify our expression in the question, if we wish to capture the words and not the dashes:
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"([a-zA-Z0-9]+)"
test_str = "this-is-a-sample-post"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Upvotes: 0
Reputation: 322
Replace:
matches_lst = [i for i in matches]
With:
matches_lst = [i.group(0) for i in matches]
Or you could just use findall
which will give you a list:
matches = re.findall(pattern, "this-is-a-sample-post")
Upvotes: 2