Counting number of unique items from a dictionary

Question

My program reads in a large log file. It then searches the file for the IP and TIME(whatever is in the brackets).

5.63.145.71 - - [30/Jun/2013:08:04:46 -0500] "HEAD / HTTP/1.1" 200 - "-" "checks.panopta.com" 5.63.145.71 - - [30/Jun/2013:08:04:49 -0500] "HEAD / HTTP/1.1" 200 - "-" "checks.panopta.com" 5.63.145.71 - - [30/Jun/2013:08:04:51 -0500] "HEAD / HTTP/1.1" 200 - "-" "checks.panopta.com"

I want to read the whole file, and summarize the entries as follows:

Num 3 IP 5.63.145.1 TIME [30/Jun/2013:08:04:46 -0500] Number of entries, IP, TIME and DATE

What I have so far:

import re


x = open("logssss.txt")

dic={}


for line in x:
    m = re.search(r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",line).group().split()
    c = re.search(r"$$(.+)$$",line).group().split()
    for i in range(len(m)):
        try:
            dic[m[i]] += 1 
        except:
            dic[m[i]] = 1
        k = dic.keys()
for i in range(len(k)):
    print dic[k[i]], k[i]

The above code displays correctly now! Thanks.

6 199.21.99.83

1 5.63.145.71

EDIT: So how about adding c into my output now, the timestamps are going to differ obviously, but just getting one of the values, on the same line, is that possible?

GWW · Accepted Answer

Move your print statement outside of the main loop

import re
x = open("logssss.txt")

dic={}


for line in x:
    m = re.search(r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",line).group().split()
    c = re.search(r"$$(.+)$$",line).group().split()
    for i in range(len(m)):
        try:
            dic[m[i]] += 1 
        except:
            dic[m[i]] = 1

for k,v in dic.iteritems(): #or items if Python 3.X
    print k, v

As a tip you could take advantage of pythons Counter class to replace your try except block

from collections import Counter
dic = Counter()
for line in x:
    m = re.search(r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",line).group().split()
    c = re.search(r"$$(.+)$$",line).group().split()
    for i in range(len(m)):
        dic[m[i]] += 1

for k,v in dic.iteritems(): #or items if Python 3.X
    print k, v

From your comment, I would just use a dictionary of lists, the count for each ip address could be extracted from the length of the list:

dic = {}
for line in x:
    m = re.search(r"\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b",line).group().split()
    c = re.search(r"$$(.+)$$",line).group().split()
    for i in range(len(m)):
        dic.setdefault(m[i], []).append(c)

for k,v in dic.iteritems(): #or items if Python 3.X
    print k, len(v), v

Counting number of unique items from a dictionary

Answers (2)

Related Questions