Praveen
Praveen

Reputation: 31

re.findall within a list in python

I have a list as below.

sample_text = ['199.72.81.55 -- [01/Jul/1995:00:00:01 -0400] "Get /histpry/appollo/HTTP/1.0" 200 6245',
    'unicomp6.unicomp.net -- [01/Jul/1995:00:00:06 -0400] "Get /shuttle/countdown/HTTP/1.0" 200 3985', 
    '199.120.110.21 -- [01/Jul/1995:00:00:01 -0400] "Get /histpry/appollo/HTTP/1.0" 200 6245',
    'burger.letters.com -- [01/Jul/1995:00:00:06 -0400] "Get /shuttle/countdown/HTTP/1.0" 200 3985', 
    '205.172.11.25 -- [01/Jul/1995:00:00:01 -0400] "Get /histpry/appollo/HTTP/1.0" 200 6245']

I need to get all host names in a list. Expected result is as below.

['199.72.81.55', 'unicomp6.unicomp.net', '199.120.110.21', 'burger.letters.com', '205.172.11.25']

My code is:

for i in range(0, len(sample_text)):
    s=sample_text[i]
    host.append(re.findall('[\d]*[.][\d]*[.][\d]*[.][\d]*|[a-z0-9]*[.][a-z]*[.][a-z]*', s))
print(host)

My output:

[['199.72.81.55'], ['unicomp6.unicomp.net'], ['199.120.110.21'], ['burger.letters.com'], ['205.172.11.25']]

How do I fix this?

Upvotes: 0

Views: 1692

Answers (5)

Praveen
Praveen

Reputation: 31

I just used .extend instead of append which resolved the issue.

host.extend(re.findall('[\d]*[.][\d]*[.][\d]*[.][\d]*|[a-z0-9]*[.][a-z]* 
             [.][a-z]*', s)) 

Upvotes: 0

vinci mojamdar
vinci mojamdar

Reputation: 634

re.findall() returns a list of strings.

Documentation: https://docs.python.org/3/library/re.html#re.findall

.append will add the list as a single item to the new list.

Try:

host.extend(

Documentation: https://docs.python.org/3/tutorial/datastructures.html

Upvotes: 0

Maurice Meyer
Maurice Meyer

Reputation: 18136

You can easily flatten host:

host = []
for i in range(0, len(sample_text)):
    s=sample_text[i]
    host += re.findall('[\d]*[.][\d]*[.][\d]*[.][\d]*|[a-z0-9]*[.][a-z]*[.][a-z]*', s)
print(host)

Output:

['199.72.81.55', 'unicomp6.unicomp.net', '199.120.110.21', 'burger.letters.com', '205.172.11.25']

Upvotes: 2

Bryce Ramgovind
Bryce Ramgovind

Reputation: 3267

Maybe try something like this:

sum(host, [])

Upvotes: -1

Cory Kramer
Cory Kramer

Reputation: 118021

Without using regex you can just str.split on '--' and take the first part

>>> [i.split('--')[0].strip() for i in sample_text]
['199.72.81.55', 'unicomp6.unicomp.net', '199.120.110.21', 'burger.letters.com', '205.172.11.25']

Similar idea, but using regex

>>> import re
>>> [re.match(r'(.*) -- .*', i).group(1) for i in sample_text]
['199.72.81.55', 'unicomp6.unicomp.net', '199.120.110.21', 'burger.letters.com', '205.172.11.25']

In both cases you can use a list comprehension to replace your for loop

Upvotes: 4

Related Questions