Reputation: 4747

Regex get subdomain but only first part

Let's say I have a few subdomains that all end start with test and end with example.com

I need to grab the text that is right test

test.apple.example.com --> apple

test.banana.example.com --> banana

test.orange.pear.example.com --> orange

It's possible that after the required text, there will just be .example.com, or there could be other parts of the url, like we see in the last example with .pear.example.com being the remaining url.

I need to only grab the first choice.

This is what I have come up with:

I need the capturing group to only be the first occurrence. In the last example it should only grab orange

Upvotes: 1

Answers (2)

anubhava

Reputation: 785266

You may use this regex with a capture group to match part after test.:

^test\.([^.]+).*\.example.com$

Use this regex in re.findall.

RegEx Demo

RegEx Details:

^: Start
test\.: Match test.
([^.]+): Match 1+ of any character that is not .
.*\.example.com: Match 0 or more of any characters followed by .example.com.
$: End

Code:

>>> import re
>>> arr = ['test.apple.example.com', 'test.banana.example.com', 'test.orange.pear.example.com']
>>> rx = re.compile(r"^test\.([^.]+).*\.example.com$")
>>> for i in arr: print (rx.findall(i))
...
['apple']
['banana']
['orange']

Upvotes: 2

igorkf

Reputation: 3565

You could do this:

import re

strings = ['test.apple.example.com', 'test.banana.example.com', 'test.orange.pear.example.com']
for x in strings:
    print(re.search(r'(?<=test\.)(.*?)(?=\.)', x).group(0))

apple
banana
orange

Edit

As suggested by @Rivers, we can change above answer due example.com:

for x in strings:
    print(re.search(r'(?<=test\.)(.*?)(?=\..*?example\.com)', x).group(0))

apple
banana
orange

Upvotes: 2

Regex get subdomain but only first part

Answers (2)

Edit

Related Questions