Reputation: 2958

Extract a matching substring in a python string

I'm trying to extract a substring from a large string that matches my pattern.

text = 'This is a large subsring. bla bla bla AND www.dumbweb.com/Dumbo and www.otherLinks.com...'

pattern = 'dumbweb.com'

here i'm trying to find the string that matches pattern

theLink = re.findall(pattern, text)
print(theLink)  //output: dumbweb.com

but i'm only able to find the exact text that i'm searching with, i'm trying to get the full string split by space

desired output:

theLink //www.dumbweb.com/Dumbo

i tired searching for similar question but i'm not able to phrase it right, i even looked up the Python Regex still not able to achieve what i'm looking for.

Upvotes: 1

Answers (5)

Saravanan

Reputation: 911

Your pattern should be

pattern = "www\.dumbweb\.com[^\\s]*"

This will print the link starting from www.dumbweb.com until there's a trailing space

Upvotes: 1

kelyen

Reputation: 242

Probably not the cleanest solution:

text = 'This is a large subsring. bla bla bla AND www.dumbweb.com/Dumbo and www.otherLinks.com...'

pattern = 'dumbweb.com'

for word in text.split():
    if word.find(pattern) > 0:
        print(word)

Upvotes: 1

Jacek Błocki

Reputation: 563

Try this:

re.search('dumbweb.com[\S]*', text).group() 
# matches your string followed by any character but white space

Upvotes: 1

mousetail

Reputation: 8010

You could try this:

[^ ]*dumbweb\.com[^ ]*

Note that in regex a . matches any character. You need to use \. to match only a literal period

Upvotes: 1

anubhava

Reputation: 786291

You may consider this approach:

import re
text = 'This is a large subsring. bla bla bla AND www.dumbweb.com/Dumbo and www.otherLinks.com...'
pattern = 'dumbweb.com'

rex = re.compile(r'\b' + r'\S*' + re.escape(pattern) + r'\S*')
print (rex.findall(text))

Output:

['dumbweb.com/Dumbo']

Explanation:

re.compile(...): compiles a given string regex pattern
r'\b': Word boundary
r'\S*': Match 0 or more non-whitespace characters
re.escape(pattern): Perform regex escape of the given string
r'\S*': Match 0 or more non-whitespace characters

Upvotes: 4

Extract a matching substring in a python string

Answers (5)

Related Questions