Stieffers
Stieffers

Reputation: 752

Performance of large key based dictionaries versus a list of objects with attributes

I'm using a Python object with a couple attributes to organize a data model, but I'm wondering if this is any less efficient than using a key based dictionary. My model stores MP3 tag data and looks as such:

class Mp3Model:
    def __init__(self, path, filename):
        self.path = path
        self.filename = filename
        self.artist = ''
        self.title = ''
        self.album = ''
        self.tracknumber = ''
        self.genre = ''
        self.date = ''

The model is used as such:

mp3s = []
for file in files:
    if os.path.splitext(file)[1] == '.mp3':
        # Append a new Mp3Model to the mp3s list for each file found
        mp3s.append(Mp3Model(os.path.join(self.dir, file), file))

Would using a key based dictionary, or even a simple list provide much performance enhancement? The length of the mps[] object list is highly variable depending on how many files are found in a given directory, and the program can slow to a crawl (I haven't implemented any threading yet) when I scan directories with tons of files.

Upvotes: 2

Views: 178

Answers (5)

baiyang
baiyang

Reputation: 13

I don't know if another method is better. you can do these as follows;

from collections import namedtuple

Mp3Model = namedtuple("Mp3Model", "path filename artist title")

it can create simple Mp3Model class.

Upvotes: 0

user1277476
user1277476

Reputation: 2909

Before you conclude that any particular part of your code is the slow part, profile the program. It's probably your innermost loop, but test that assumption, don't just leap to it.

For CPU-bound loads, try Pypy.

For I/O-bound loads, try caching, or aggregating lots of small files into a smaller number of large files somehow. Open tends to be slow compared to reading a bit of sequential data.

HTH

Upvotes: 0

Shawn Chin
Shawn Chin

Reputation: 86844

Unless you declare __slots__ for your object, object attributes are stored in an underlying dict anyway so using a dict would be marginally faster than an object. However the difference would be negligible compare to the rest of your code.

The selection of a data structure should depend on various other factors:

  • do you need to store the results?
  • how are you accessing the data? Serial or random access?
  • which keys would you be searching your data on?
  • if you're planning to parallelise your task, can it handle concurrent writes? what would the locking overheads be?
  • ...

Optimising for your use case would lead to much higher returns.

Upvotes: 5

Lanaru
Lanaru

Reputation: 9711

Using a dict will be more efficient than using a class. You avoid dealing with all the overhead of classes, attribute access etc. Not to mention, accessing items in a dictionary via a key is one of the most efficient and optimized pieces of code in python.

A few caveats, though:

  1. You can only know for sure by testing.
  2. Don't optimize unless you know it's necessary.

Upvotes: 2

pcalcao
pcalcao

Reputation: 15990

This is just a guess, but I believe the bottleneck is probably in reading the files from the os, not in building the list.

That being said, you can test it simply by just creating a list with all the filenames and comparing the performance to building a list of objects with the file names.

Upvotes: 0

Related Questions