Ali Rajwani
Ali Rajwani

Reputation: 1

python regex API pulling and converting to Text format giving error

I have a simple python code to pull some malware feeds from open source api and find the only IP from this list .

The url already contains IP but when you capture it and save in local file you can see there are other string \r\n present after each IP may be because of new line. Can some one pls guide as I am new to Python and what i am doing wrong here?

import urllib.request
import urllib.parse
import re


url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
ip = re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', resp)
malwareIPList = ip.read()
print (malwareIPlist)

error line 223, in findall return _compile(pattern, flags).findall(string) TypeError: expected string or bytes-like object

Upvotes: 0

Views: 27

Answers (1)

dawg
dawg

Reputation: 104082

The issue is that you need to .read() the resp from urllib.request.urlopen

Consider:

import urllib.request
import urllib.parse
import re


url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
print(resp)

Prints:

<http.client.HTTPResponse object at 0x103a4ccf8>

What I think you are looking for is:

url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
ip = re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', str(resp.read(), 'utf-8'))

print (ip)

Prints a bunch of IP addresses...


BTW, since the data are ip addresses delimited by \r\n you actually do not need a regex. You can do:

>>> str(resp.read(), 'utf-8').splitlines()
['103.14.120.121', '103.19.89.55', '103.224.212.222', '103.24.13.91', ...]

Upvotes: 1

Related Questions