Reputation: 4747
Let's say I have a few subdomains that all end start with test
and end with example.com
I need to grab the text that is right test
test.apple.example.com --> apple
test.banana.example.com --> banana
test.orange.pear.example.com --> orange
It's possible that after the required text, there will just be .example.com
, or there could be other parts of the url, like we see in the last example with .pear.example.com
being the remaining url.
I need to only grab the first choice.
This is what I have come up with:
I need the capturing group to only be the first occurrence. In the last example it should only grab orange
Upvotes: 1
Views: 276
Reputation: 785266
You may use this regex with a capture group to match part after test.
:
^test\.([^.]+).*\.example.com$
Use this regex in re.findall
.
RegEx Details:
^
: Starttest\.
: Match test.
([^.]+)
: Match 1+ of any character that is not .
.*\.example.com
: Match 0 or more of any characters followed by .example.com
.$
: EndCode:
>>> import re
>>> arr = ['test.apple.example.com', 'test.banana.example.com', 'test.orange.pear.example.com']
>>> rx = re.compile(r"^test\.([^.]+).*\.example.com$")
>>> for i in arr: print (rx.findall(i))
...
['apple']
['banana']
['orange']
Upvotes: 2
Reputation: 3565
You could do this:
import re
strings = ['test.apple.example.com', 'test.banana.example.com', 'test.orange.pear.example.com']
for x in strings:
print(re.search(r'(?<=test\.)(.*?)(?=\.)', x).group(0))
apple
banana
orange
As suggested by @Rivers, we can change above answer due example.com
:
for x in strings:
print(re.search(r'(?<=test\.)(.*?)(?=\..*?example\.com)', x).group(0))
apple
banana
orange
Upvotes: 2