Reputation: 1
I have a simple python code to pull some malware feeds from open source api and find the only IP from this list .
The url already contains IP but when you capture it and save in local file you can see there are other string \r\n present after each IP may be because of new line. Can some one pls guide as I am new to Python and what i am doing wrong here?
import urllib.request
import urllib.parse
import re
url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
ip = re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', resp)
malwareIPList = ip.read()
print (malwareIPlist)
error line 223, in findall return _compile(pattern, flags).findall(string) TypeError: expected string or bytes-like object
Upvotes: 0
Views: 27
Reputation: 104082
The issue is that you need to .read()
the resp
from urllib.request.urlopen
Consider:
import urllib.request
import urllib.parse
import re
url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
print(resp)
Prints:
<http.client.HTTPResponse object at 0x103a4ccf8>
What I think you are looking for is:
url = 'http://www.malwaredomainlist.com/hostslist/ip.txt'
resp = urllib.request.urlopen(url)
ip = re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', str(resp.read(), 'utf-8'))
print (ip)
Prints a bunch of IP addresses...
BTW, since the data are ip addresses delimited by \r\n
you actually do not need a regex. You can do:
>>> str(resp.read(), 'utf-8').splitlines()
['103.14.120.121', '103.19.89.55', '103.224.212.222', '103.24.13.91', ...]
Upvotes: 1