Makc
Makc

Reputation: 307

Fastest way to process and save UDP flow in python

i'm developing client for scientific measurement device that connected to PC by 1Gb ethernet.

Test PC CPU is i5-460M (2.53x2) + 8Gb ram. OS Win 7 x64 (can't be changed to linux). Python 2.7.6 x86

Device sends data in UDP packets with following format:

  uint  meas_id;
  uint  part_id;
  ubyte data[1428];

Data rate is 1Gb/s (around 70'000 packets per second).

I need to recieve and dump data on disk (for around 10 minutes) for future processing, but faced two problems: packets drop (while transfering data between threads) and HDD usage.

Current structure is two working processes:

  1. Recieve udp packets, accumulate chunk of 1000 packets, send data in multiprocessing.Pipe/Queue to another process.
  2. Fetch chunks prom Pipe/Queue, deserialize structure (at least first 2 fields) and save.

Using raw python socket i can receive around 110k pps on my machine without packet drops, just with

s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 1024*1024*256) # real buffer is less
s.bind(("0.0.0.0", 8201))
while is_active:
    ...
    data = s.recv(1536)

But some packets become dropped. when i'm trying to send data to another process using code like this:

data_buf = []
while 1:
    d = s.recv(1536)
    data_buf.append(d)
    if len(data_buf) == CHUNK_SIZE:
        xchg_queue.put(data_buf)
        data_buf = []

Pipe is faster, but as i can see - pipe.send() may lock if there is some objects in pipe.

Is there faster ways to send data between processes ?

I've tryed MySQL as storage with disabled indexes and enabled delayed write but got around 30-35k packets per second saving rate.

With cPickle a got 40-50k pps when saving 1000 - 100000 packets per file.

Is there is much more fast way to save data ? May be PyTables(HDF5) or some fast NoSQL DB (redis-like).

Also i'm not sure that this client is possible in python - may be it's necessary to rewrite module in pure C.

Or may be there is fast wrapper on python sockets (like gevent) ?

Hope you will help.

Upvotes: 3

Views: 2586

Answers (1)

Steffen Ullrich
Steffen Ullrich

Reputation: 123521

If you just need to save the data for future processing I would not use the overhead of python and a database, but instead just use tshark or windump to save the data as fast as possible and with the least overhead into a single file. This is also the cheapest for HDD because you only append at the file. Later you could use python with winpcap or other tools to process the data without the pressure of loosing any data and write them in the format you need.

Upvotes: 0

Related Questions