Reputation: 11
I'm trying to parse through network traffic and compare the domain names in the traffic to a list of the most common websites. the intent is to print all the site names that are not on the list of common websites
with open('/Users/downloads/scripting_for_security/resources/top_100.txt') as f:
safeAdd = f.readlines(),
with open('/Users/downloads/scripting_for_security/resources/traffic_log.txt') as n:
netTraffic = n.readlines(),
domainTraffic = re.findall(r'\s(?:www.)?(\w+.com)', netTraffic)
for i in safeAdd:
for e in domainTraffic:
if i != e:
print(e)
getting a type error
TypeError Traceback (most recent call last) in 8 netTraffic = n.readlines(), 9 ---> 10 domainTraffic = re.findall(r'\s(?:www.)?(\w+.com)', netTraffic) 11 12
~/anaconda3/lib/python3.7/re.py in findall(pattern, string, flags) 221 222 Empty matches are included in the result.""" --> 223 return _compile(pattern, flags).findall(string) 224 225 def finditer(pattern, string, flags=0):
TypeError: expected string or bytes-like object
Upvotes: 1
Views: 72
Reputation: 14721
The problem here is you are passing a list
of lines
not a text to re.findall
,
use read()
instead of readlines()
:
with open('data.txt') as f:
print(type(f.readlines())) # list
print(type(f.read())) # str accepted by the re.findall or any other function
In your code change this:
safeAdd = f.read()
netTraffic = n.read()
and remove the ,
, netTraffic
will be a tuple
contains one list
of lines
, check this out:
x = 1, # equavalent to x = (1,) result is tuple
x = 1 # is equavalent to x = (1) without "," it's integer
Upvotes: 0
Reputation: 1440
As mentioned previously, re.findall
expects a string and you are passing a list. One of the ways to tackle this could be to iterate over the list of strings (netTraffic
) and build a list of all matches found (domainTraffic
). I've shown this below:
with open('/Users/downloads/scripting_for_security/resources/top_100.txt') as f:
safeAdd = f.readlines(),
with open('/Users/downloads/scripting_for_security/resources/traffic_log.txt') as n:
netTraffic = n.readlines(),
#initialize empty list
domainTraffic = []
#iterate over each value and add matches to the list
for net in netTraffic:
domainTraffic.extend(re.findall(r'\s(?:www.)?(\w+.com)', str(net))
#Use list comprehension to filter out the safeAdds
filtered_list = [add for add in domainTraffic if add not in safeAdd]
print(filtered_list)
You could also join
the list into a long string and then run re.findall
on the combined string. It really depends on what your strings are.
Upvotes: 0
Reputation: 2315
netTraffic is a list as per https://docs.python.org/3/tutorial/inputoutput.html
findall expects a second argument of type string https://docs.python.org/3/library/re.html#re.findall
Upvotes: 0