Robottinosino
Robottinosino

Reputation: 10902

Python RegEx: Extract link

I have this like:

javascript:ColdFusion.Window.show('theformats');ColdFusion.navigate('exportformats.cfm?id=1900067&expformat=bibtex','theformats');

Let's split this into 2 parts:

1) 'exportformats.cfm?id=1900067&expformat=bibtex' 2) all the rest, left and right of it

What the BEST way in Python to get 1) given that 2) never changes?

So far, I have tried "finding" [ColdFusion.navigate('] in the string and slicing from there until [','] but I would really like to learn how to concoct the very best RegEx for it and do so in Python, please.

Upvotes: 0

Views: 124

Answers (4)

pat34515
pat34515

Reputation: 1979

I agree with arxanas's answer but if your 1) might include single quotes or other characters in it:

str = "javascript:ColdFusion.Window.show('theformats');ColdFusion.navigate('exportformats.cfm?id=1900067'&expformat=bibtex','theformats');"
str = str.split("javascript:ColdFusion.Window.show('theformats');ColdFusion.navigate('")[1].split("','theformats');")[0]

http://codepad.org/lAk5d6ZV

Upvotes: 1

Jon Clements
Jon Clements

Reputation: 142256

I believe you're after:

re.search(r"ColdFusion.navigate\('(.*?)'", string).group(1)

Or for before and after:

m = re.match(r"(.*?)ColdFusion.navigate\('(.*?)'(.*)", string)
# m.group(1) == before, m.group(2) = url, m.group(3) = after

Upvotes: 0

Paulo Scardine
Paulo Scardine

Reputation: 77399

>>> import re
>>> sample = "javascript:ColdFusion.Window.show('theformats');ColdFusion.navigate('exportformats.cfm?id=1900067&expformat=bibtex','theformats');"
>>> regex = r"javascript:ColdFusion\.Window\.show\('theformats'\);ColdFusion.navigate\('([^']+)','theformats'\);"
>>> print re.match(regex, sample).group(1)
'exportformats.cfm?id=1900067&expformat=bibtex'

Upvotes: 1

Waleed Khan
Waleed Khan

Reputation: 11477

You don't need a regex. Oftentimes, when faced with paired symbols, you can do something like this:

mystr = "javascript:ColdFusion.Window.show('theformats');ColdFusion.navigate('exportformats.cfm?id=1900067&expformat=bibtex','theformats');"
mystr.split("'")[3] # Returns exportformats.cfm?id=1900067&expformat=bibtex

Upvotes: 1

Related Questions