Reputation: 977
I have the following input string
string = ['sql ddlsql144.internal.ecnahcdroffilc.net',
'fra-sql-03.internal.ecnahcdroffilc.net bro',
'esc-cca3cdr-12.internal.ecnahcdroffilc.com',
'au-per-06a-stwp-01.per.asia.ecnahcdroffilc.com',\
'http://go.fotrscomi.com',
'http //go.fotrscomi.com',
'fotrscomi.windows.computer',
'printers-03.internal.clif 10.51.59.10 roalswinds.oionr']
I want the result to be
['ddlsql144',
'fra-sql-03',
'esc-cca3cdr-12',
'au-per-06a-stwp-01'
'10.51.59.10' ]
Condition to match the pattern is :
.com
or .net
should match but it should not start with https://
or http://
or http //
url
must be returnedI tried
expression = "(\w[-.a-z0-9]*)..?(?=org|net|com)"
# to extract the whole url
urls = re.findall(expression, str(string))
to get the initial part I used
re.findall('(^\w.+?)\.',str(urls))
But this didn't gave me the expected results.
Upvotes: 1
Views: 88
Reputation: 92854
Extended solution with re.search
function and specific regex pattern:
import re
items = ['sql ddlsql144.internal.ecnahcdroffilc.net','fra-sql-03.internal.ecnahcdroffilc.net bro',
'esc-cca3cdr-12.internal.ecnahcdroffilc.com', 'au-per-06a-stwp-01.per.asia.ecnahcdroffilc.com',
'http://go.fotrscomi.com', 'http //go.fotrscomi.com',
'fotrscomi.windows.computer', 'printers-03.internal.clif 10.51.59.10 roalswinds.oionr'
]
result = []
pat = re.compile(r'(http )?([^\s.]+)[^\s]+\.(?:org|net|com)\b|\b((?:[0-9]{1,3}\.){3}[0-9]{1,3})\b')
for i in items:
m = pat.search(i)
if m:
if not m.group(1) and m.group(2) and not m.group(2).startswith('http'):
result.append(m.group(2))
elif m.group(3):
result.append(m.group(3))
print(result)
The output:
['ddlsql144', 'fra-sql-03', 'esc-cca3cdr-12', 'au-per-06a-stwp-01', '10.51.59.10']
Upvotes: 2