loop0
loop0

Reputation: 1

How can I optimize this code?

I'm developing a logger daemon to squid to grab the logs on a mongodb database. But I'm experiencing too much cpu utilization. How can I optimize this code?


from sys import stdin

from pymongo import Connection

connection = Connection()
db = connection.squid
logs = db.logs
buffer = []
a = 'timestamp'
b = 'resp_time'
c = 'src_ip'
d = 'cache_status'
e = 'reply_size'
f = 'req_method'
g = 'req_url'
h = 'username'
i = 'dst_ip'
j = 'mime_type'
L = 'L'

while True:
    l = stdin.readline()
    if l[0] == L:
        l = l[1:].split()
        buffer.append({
            a: float(l[0]),
            b: int(l[1]),
            c: l[2],
            d: l[3],
            e: int(l[4]),
            f: l[5],
            g: l[6],
            h: l[7],
            i: l[8],
            j: l[9]
            }
        )
    if len(buffer) == 1000:
        logs.insert(buffer)
        buffer = []

    if not l:
        break

connection.disconnect()

Upvotes: 0

Views: 257

Answers (3)

fabmilo
fabmilo

Reputation: 48330

The cpu usage is given by that active loop While True. How many lines / minute do you have? put the

if len(buffer) == 1000:    
    logs.insert(buffer)
    buffer = []

check after the buffer.append

I will tell you more after you tell me how many insertions you get so far

Upvotes: 0

saramah
saramah

Reputation: 168

This might be a better question for a python profiler. There's a few builtin Python profiling modules such as cProfile; you can read more about it here.

Upvotes: 1

MK.
MK.

Reputation: 34587

I'd suspect it might actually be readline() causing cpu utilization. Try running the same code with the readline replaced with just looking at some constant buffer provided by you. And try running with the database inserts commented out. Establish which one of these is the culprit.

Upvotes: 0

Related Questions