Reputation:

Want to extract string using regex

I've string like this [[English language|English]]. I tried to extract text from string. But no luck. I just Want to ignore text [[English language|.Output should be English.

Another example: [[Stack Exchange|Question]] Output should be only Question

If there is no |

[[Stack Exchange]] Then output should be only Stack Exchange.

I'm new to regex. Will you please help me ? Thank you so much

Upvotes: 2

Answers (2)

Abhijit

Reputation: 63707

This can be done without regex

>>> text="[[English language|English]]"
>>> text.strip("[]").split("|")[-1]
'English'
>>> text="[[Stack Exchange|Question]]"
>>> text.strip("[]").split("|")[-1]
'Question'
>>> text="[[Stack Exchange]]"
>>> text.strip("[]").split("|")[-1]
'Stack Exchange'

Note, first strip all "[" and "]" from either end and then split the string with "|" as separator. Return the last item from the list.

Using Regex

>>> text="[[English language|English]]"
>>> re.findall("([^\[\]\|]+)",text)[-1]
'English'
>>> text="[[Stack Exchange|Question]]"
>>> re.findall("([^\[\]\|]+)",text)[-1]
'Question'
>>> text="[[Stack Exchange]]"
>>> re.findall("([^\[\]\|]+)",text)[-1]
'Stack Exchange'
>>>

In case no match is found, it will generate index Error: So we can do the following modification

try:
   result=text.strip("[]").split("|")[-1]
except IndexError:
   None #or what ever you intend to have here

try:
    result=re.findall("([^\[\]\|]+)",text)[-1]
except IndexError:
   None #or what ever you intend to have here

Performance Comparison

>>> stmt1="""
import re
text="[[English language|English]]"
try:
    result=re.findall("([^\[\]\|]+)",text)[-1]
except IndexError:
    None
"""
>>> stmt2="""
text="[[English language|English]]"
try:
    result=text.strip("[]").split("|")[-1]
except IndexError:
    None
"""
>>> import timeit
>>> t1=timeit.Timer(stmt=stmt1)
>>> t2=timeit.Timer(stmt=stmt2)
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=100000)/100000)
4.89 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=100000)/100000)
1.43 usec/pass
>>>

Upvotes: 1

alex

Reputation: 490143

This regex will do it.

^\[\[(?:.*?\|)?(.*?)?\]\]$

RegExr.

The first capturing group will contain the text you want.

Upvotes: 1

Want to extract string using regex

Answers (2)

Related Questions