ben shalev
ben shalev

Reputation: 103

Match a word that doesn't contain a dot and isn't an IP regex

I want to get a list and filter it (In this case it's a list of a record, a domain name and an ip). I want the list to be something like so:

10.0.0.10 ansible0 ben1.com  
ansible1 ben1.com  10.0.0.10

Aka you can put the ip the zone and the record anywhere and it will still catch them.

Now i got 2 regex, one that catches the domain (with the dot) and the IP:

Domain: [a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}

Simple IP: (?:[0-9]{1,3}\.){3}[0-9]{1,3}

With these i can catch in python all the domain names and put them into a list and all ips.

Now i only need to catch the "subdomain" (In this case ansible1 and ansible0).

I want it to be able to have numbers and characters like - _ * and so on, anything but a ..

How can i do it via regex?

Upvotes: 1

Views: 67

Answers (1)

anubhava
anubhava

Reputation: 785276

You can use this regex with 3 alternations and 3 named groups:

(?P<domain>[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,})|
(?P<ip>(?:[0-9]{1,3}\.){3}[0-9]{1,3})|
(?P<sub>[^\s.]+)

RegEx Demo

Named groups domain and ip are using regex you've provided. 3rd group is (?P<sub>[^\s.]+) that is matching 1+ of any characters that are not dot and not whitespace.


Code:

import re

arr = ['10.0.0.10 ansible0 ben1.com', 'ansible1 ben1.com  10.0.0.10']

rx = re.compile(r'(?P<domain>[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,})|(?P<ip>(?:[0-9]{1,3}\.){3}[0-9]{1,3})|(?P<sub>[^\s.]+)')

subs = []
for i in arr:
     for m in rx.finditer(i):
             if (m.group('sub')): subs.append(m.group('sub'))

print (subs)

Output:

['ansible0', 'ansible1']

Upvotes: 1

Related Questions