AP257
AP257

Reputation: 93783

Regular expression for string between two strings?

Sorry, I know this is probably a duplicate but having searched for 'python regular expression match between' I haven't found anything that answers my question!

The document (which to make clear, is a long HTML page) I'm searching has a whole bunch of strings in it (inside a JavaScript function) that look like this:

link: '/Hidden/SidebySideGreen/dei1=1204970159862'};
link: '/Hidden/SidebySideYellow/dei1=1204970159862'};

I want to extract the links (i.e. everything between quotes within these strings) - e.g. /Hidden/SidebySideYellow/dei1=1204970159862

To get the links, I know I need to start with:

re.matchall(regexp, doc_sting)

But what should regexp be?

Upvotes: 0

Views: 2854

Answers (3)

ghostdog74
ghostdog74

Reputation: 342303

Use a few simple splits

>>> s="link: '/Hidden/SidebySideGreen/dei1=1204970159862'};"
>>> s.split("'")
['link: ', '/Hidden/SidebySideGreen/dei1=1204970159862', '};']
>>> for i in s.split("'"):
...     if "/" in i:
...         print i
...
/Hidden/SidebySideGreen/dei1=1204970159862
>>>

Upvotes: 0

poke
poke

Reputation: 387557

The answer to your question depends on how the rest of the string may look like. If they are all like this link: '<URL>'}; then you can do it very simple using simple string manipulation:

myString = "link: '/Hidden/SidebySideGreen/dei1=1204970159862'};"
print( myString[7:-3] )

(If you just have one string with multiple lines by that, you can just split the string into lines.)

If it is a bit more complex though, using regular expressions are fine. One example that just looks for the url inside of the quotes would be:

myDoc = """link: '/Hidden/SidebySideGreen/dei1=1204970159862'};
link: '/Hidden/SidebySideYellow/dei1=1204970159862'};"""

print( re.findall( "'([^']+)'", myDoc ) )

Depending on how the whole string looks, you might have to include the link: as well:

print( re.findall( "link: '([^']+)'", myDoc ) )

Upvotes: 3

Krzysztof Bujniewicz
Krzysztof Bujniewicz

Reputation: 2417

I'd start with:

regexp = "'([^']+)'"

And check if it works okay - I mean, if the only condition is that string is in one line between '', it should be good as it is.

Upvotes: 1

Related Questions