Reputation: 11

Comparing Network Traffic to Authorized List(s) via Domain Name

I'm trying to parse through network traffic and compare the domain names in the traffic to a list of the most common websites. the intent is to print all the site names that are not on the list of common websites


with open('/Users/downloads/scripting_for_security/resources/top_100.txt') as f:
    safeAdd = f.readlines(),


with open('/Users/downloads/scripting_for_security/resources/traffic_log.txt') as n:
    netTraffic = n.readlines(),

domainTraffic = re.findall(r'\s(?:www.)?(\w+.com)', netTraffic)


for i in safeAdd:
    for e in domainTraffic:
        if i != e:
            print(e)

getting a type error

TypeError Traceback (most recent call last) in 8 netTraffic = n.readlines(), 9 ---> 10 domainTraffic = re.findall(r'\s(?:www.)?(\w+.com)', netTraffic) 11 12

~/anaconda3/lib/python3.7/re.py in findall(pattern, string, flags) 221 222 Empty matches are included in the result.""" --> 223 return _compile(pattern, flags).findall(string) 224 225 def finditer(pattern, string, flags=0):

TypeError: expected string or bytes-like object

Upvotes: 1

Answers (3)

Charif DZ

Reputation: 14721

The problem here is you are passing a list of lines not a text to re.findall, use read() instead of readlines():

with open('data.txt') as f:
    print(type(f.readlines()))  # list
    print(type(f.read()))       # str accepted by the re.findall or any other function

In your code change this:

safeAdd = f.read()

netTraffic = n.read()

and remove the , , netTraffic will be a tuple contains one list of lines, check this out:

  x = 1, # equavalent to x = (1,)  result is tuple
  x = 1 # is equavalent to x = (1) without "," it's integer

Upvotes: 0

razdi

Reputation: 1440

As mentioned previously, re.findall expects a string and you are passing a list. One of the ways to tackle this could be to iterate over the list of strings (netTraffic) and build a list of all matches found (domainTraffic). I've shown this below:

with open('/Users/downloads/scripting_for_security/resources/top_100.txt') as f:
    safeAdd = f.readlines(),


with open('/Users/downloads/scripting_for_security/resources/traffic_log.txt') as n:
    netTraffic = n.readlines(),

#initialize empty list
domainTraffic = []

#iterate over each value and add matches to the list
for net in netTraffic:
    domainTraffic.extend(re.findall(r'\s(?:www.)?(\w+.com)', str(net))

#Use list comprehension to filter out the safeAdds
filtered_list = [add for add in domainTraffic if add not in safeAdd]

print(filtered_list)

You could also join the list into a long string and then run re.findall on the combined string. It really depends on what your strings are.

Upvotes: 0

bearrito

Reputation: 2315

netTraffic is a list as per https://docs.python.org/3/tutorial/inputoutput.html

findall expects a second argument of type string https://docs.python.org/3/library/re.html#re.findall

Upvotes: 0

Comparing Network Traffic to Authorized List(s) via Domain Name

Answers (3)

Related Questions