Reputation: 1

How to use set() in python

Guys please help me to overcome the issue I've faced while using set() function. when I run the bellow code the output of the file "iplist.txt" expected to be:

192.168.248.2
192.168.248.20

but it is as bellow:

And, output of print (a) is as bellow:

192.168.248.2
192.168.248.2
192.168.248.20
192.168.248.20

Here is the code:

for key, group in groupby(logfile, key=lambda e: e.split('.',1)[0]):
    for entry in group:
        c.update(re.findall(r'[0-9]+(?:\.[0-9]+){3}', entry))
    for ip, cnt in c.items():
       if cnt >= 5 and cnt <=10:
          newip.append(ip)
       elif cnt > 10:
          match = re.search(r'->\s*([0-9]+(?:\.[0-9]+){3})', entry)
          if match:
              a = match.group(1)
              print (a)

          with open("C:\\Users\Raz\\Desktop\\pythondemo\\iplist.txt", 'w+') as f:
              f.write('\n' .join(set(a))+'\n\n')
              f.close()
       else:
           print ("There are no malicious packets yet")

Here is the log.txt file containing IPs:

12/30-04:09:41.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.2:41673 -> 192.168.248.2:21
12/30-04:09:41.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.2:41676 -> 192.168.248.2:21
12/30-04:09:41.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.2:41673 -> 192.168.248.2:21

12/30-04:09:40.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.21:41676 -> 192.168.248.20:21
12/30-04:09:40.070967 [**] [1:10000001:1] snort alert [1:0000001] [**] [classification ID: 0] [Priority ID: 0] {ICMP} 192.168.232.21:41673 -> 192.168.248.20:21

Now my question is:

why print (a) shows duplicated IPs (not more and not less)?
why set(a) extracts unique characters while I want unique IPs

Upvotes: 0

Answers (2)

Daniel

Reputation: 42768

Your first problem is, that a is a string:

>>> set('192.168.248.20')
set(['.', '1', '0', '2', '4', '6', '9', '8'])

your second problem is, that you overwrite your file each time, a new entry is found (mode 'w+' instead of 'a')

The third problem is, that you never collect all IPs to build a set.

Upvotes: 0

Mohammad Yusuf

Reputation: 17074

If the format of you log file remains exactly the same and doesn't changes then you can implement it with pandas as well, like this:

import pandas as pd

df = pd.read_csv('log.txt' , sep='\s+', header=None)

df[16]=df[16].apply(lambda x: x.split(':')[0])
print df[16].unique().tolist()

Output:

['192.168.248.2', '192.168.248.20']

If you don't want to use pandas then wait for other incoming answers.

Upvotes: 1

How to use set() in python

Answers (2)

Related Questions