Reputation:
I've string like this [[English language|English]]
. I tried to extract text from string. But no luck. I just Want to ignore text [[English language|
.Output should be English
.
Another example:
[[Stack Exchange|Question]]
Output should be only Question
If there is no |
[[Stack Exchange]]
Then output should be only Stack Exchange
.
I'm new to regex
. Will you please help me ? Thank you so much
Upvotes: 2
Views: 120
Reputation: 63707
This can be done without regex
>>> text="[[English language|English]]"
>>> text.strip("[]").split("|")[-1]
'English'
>>> text="[[Stack Exchange|Question]]"
>>> text.strip("[]").split("|")[-1]
'Question'
>>> text="[[Stack Exchange]]"
>>> text.strip("[]").split("|")[-1]
'Stack Exchange'
Note, first strip all "[" and "]" from either end and then split the string with "|" as separator. Return the last item from the list.
Using Regex
>>> text="[[English language|English]]"
>>> re.findall("([^\[\]\|]+)",text)[-1]
'English'
>>> text="[[Stack Exchange|Question]]"
>>> re.findall("([^\[\]\|]+)",text)[-1]
'Question'
>>> text="[[Stack Exchange]]"
>>> re.findall("([^\[\]\|]+)",text)[-1]
'Stack Exchange'
>>>
In case no match is found, it will generate index Error: So we can do the following modification
try:
result=text.strip("[]").split("|")[-1]
except IndexError:
None #or what ever you intend to have here
or
try:
result=re.findall("([^\[\]\|]+)",text)[-1]
except IndexError:
None #or what ever you intend to have here
Performance Comparison
>>> stmt1="""
import re
text="[[English language|English]]"
try:
result=re.findall("([^\[\]\|]+)",text)[-1]
except IndexError:
None
"""
>>> stmt2="""
text="[[English language|English]]"
try:
result=text.strip("[]").split("|")[-1]
except IndexError:
None
"""
>>> import timeit
>>> t1=timeit.Timer(stmt=stmt1)
>>> t2=timeit.Timer(stmt=stmt2)
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=100000)/100000)
4.89 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=100000)/100000)
1.43 usec/pass
>>>
Upvotes: 1