Reputation: 2948
I'm trying to extract a substring from a large string that matches my pattern.
text = 'This is a large subsring. bla bla bla AND www.dumbweb.com/Dumbo and www.otherLinks.com...'
pattern = 'dumbweb.com'
here i'm trying to find the string that matches pattern
theLink = re.findall(pattern, text)
print(theLink) //output: dumbweb.com
but i'm only able to find the exact text that i'm searching with, i'm trying to get the full string split by space
desired output:
theLink //www.dumbweb.com/Dumbo
i tired searching for similar question but i'm not able to phrase it right, i even looked up the Python Regex still not able to achieve what i'm looking for.
Upvotes: 1
Views: 2228
Reputation: 911
Your pattern should be
pattern = "www\.dumbweb\.com[^\\s]*"
This will print the link starting from www.dumbweb.com until there's a trailing space
Upvotes: 1
Reputation: 242
Probably not the cleanest solution:
text = 'This is a large subsring. bla bla bla AND www.dumbweb.com/Dumbo and www.otherLinks.com...'
pattern = 'dumbweb.com'
for word in text.split():
if word.find(pattern) > 0:
print(word)
Upvotes: 1
Reputation: 563
Try this:
re.search('dumbweb.com[\S]*', text).group()
# matches your string followed by any character but white space
Upvotes: 1
Reputation: 8011
You could try this:
[^ ]*dumbweb\.com[^ ]*
Note that in regex a .
matches any character. You need to use \.
to match only a literal period
Upvotes: 1
Reputation: 784998
You may consider this approach:
import re
text = 'This is a large subsring. bla bla bla AND www.dumbweb.com/Dumbo and www.otherLinks.com...'
pattern = 'dumbweb.com'
rex = re.compile(r'\b' + r'\S*' + re.escape(pattern) + r'\S*')
print (rex.findall(text))
Output:
['dumbweb.com/Dumbo']
Explanation:
re.compile(...)
: compiles a given string regex patternr'\b'
: Word boundaryr'\S*'
: Match 0 or more non-whitespace charactersre.escape(pattern)
: Perform regex escape of the given stringr'\S*'
: Match 0 or more non-whitespace charactersUpvotes: 4