Miki Tebeka
Miki Tebeka

Reputation: 13910

Python implementation of avro slow?

I'm reading some data from avro file using the avro library. It takes about a minute to load 33K objects from the file. This seem very slow to me, specially with the Java version reading the same file in about 1sec.

Here is the code, am I doing something wrong?

import avro.datafile
import avro.io
from time import time

def load(filename):
    fo = open(filename, "rb")
    reader = avro.datafile.DataFileReader(fo, avro.io.DatumReader())
    for i, record in enumerate(reader):
        pass

    return i + 1

def main(argv=None):
    import sys
    from argparse import ArgumentParser

    argv = argv or sys.argv

    parser = ArgumentParser(description="Read avro file")


    start = time()
    num_records = load("events.avro")
    end = time()

    print("{0} records in {1} seconds".format(num_records, end - start))

if __name__ == "__main__":
    main()

Upvotes: 8

Views: 3312

Answers (2)

Uri Laserson
Uri Laserson

Reputation: 2451

It appears there is a python package called fastavro that is a fast Cython implementation, but is less feature-complete.

https://github.com/fastavro/fastavro

Upvotes: 6

samplebias
samplebias

Reputation: 37919

The avro Python package available on PyPI is pure Python, so I'm not surprised if it is slower than Java by an order of magniture or more.

There is an Avro C implementation, but to my knowledge nobody has yet created a Python extension based on it.

Upvotes: 3

Related Questions