Reputation: 31
I have a list as below.
sample_text = ['199.72.81.55 -- [01/Jul/1995:00:00:01 -0400] "Get /histpry/appollo/HTTP/1.0" 200 6245',
'unicomp6.unicomp.net -- [01/Jul/1995:00:00:06 -0400] "Get /shuttle/countdown/HTTP/1.0" 200 3985',
'199.120.110.21 -- [01/Jul/1995:00:00:01 -0400] "Get /histpry/appollo/HTTP/1.0" 200 6245',
'burger.letters.com -- [01/Jul/1995:00:00:06 -0400] "Get /shuttle/countdown/HTTP/1.0" 200 3985',
'205.172.11.25 -- [01/Jul/1995:00:00:01 -0400] "Get /histpry/appollo/HTTP/1.0" 200 6245']
I need to get all host names in a list. Expected result is as below.
['199.72.81.55', 'unicomp6.unicomp.net', '199.120.110.21', 'burger.letters.com', '205.172.11.25']
My code is:
for i in range(0, len(sample_text)):
s=sample_text[i]
host.append(re.findall('[\d]*[.][\d]*[.][\d]*[.][\d]*|[a-z0-9]*[.][a-z]*[.][a-z]*', s))
print(host)
My output:
[['199.72.81.55'], ['unicomp6.unicomp.net'], ['199.120.110.21'], ['burger.letters.com'], ['205.172.11.25']]
How do I fix this?
Upvotes: 0
Views: 1692
Reputation: 31
I just used .extend instead of append which resolved the issue.
host.extend(re.findall('[\d]*[.][\d]*[.][\d]*[.][\d]*|[a-z0-9]*[.][a-z]*
[.][a-z]*', s))
Upvotes: 0
Reputation: 634
re.findall()
returns a list of strings.
Documentation: https://docs.python.org/3/library/re.html#re.findall
.append
will add the list as a single item to the new list.
Try:
host.extend(
Documentation: https://docs.python.org/3/tutorial/datastructures.html
Upvotes: 0
Reputation: 18136
You can easily flatten host
:
host = []
for i in range(0, len(sample_text)):
s=sample_text[i]
host += re.findall('[\d]*[.][\d]*[.][\d]*[.][\d]*|[a-z0-9]*[.][a-z]*[.][a-z]*', s)
print(host)
Output:
['199.72.81.55', 'unicomp6.unicomp.net', '199.120.110.21', 'burger.letters.com', '205.172.11.25']
Upvotes: 2
Reputation: 118021
Without using regex you can just str.split
on '--'
and take the first part
>>> [i.split('--')[0].strip() for i in sample_text]
['199.72.81.55', 'unicomp6.unicomp.net', '199.120.110.21', 'burger.letters.com', '205.172.11.25']
Similar idea, but using regex
>>> import re
>>> [re.match(r'(.*) -- .*', i).group(1) for i in sample_text]
['199.72.81.55', 'unicomp6.unicomp.net', '199.120.110.21', 'burger.letters.com', '205.172.11.25']
In both cases you can use a list comprehension to replace your for
loop
Upvotes: 4