zhuhuren
zhuhuren

Reputation: 347

Can a python generator work like a dictionary?

I have a big list of company information in an excel spreadsheet. I need to bring the company info into my program to process.

Each company has a unique label which is used for accessing the companies. I can create a dictionary using the labels as the keys and the company info as the values, such as {label1: company1, label2: company2, ...}. By doing it this way, when the dictionary is created, it eats up too much memory.

Is it possible to create a generator that can be used like a dictionary?

Upvotes: 0

Views: 2976

Answers (3)

eqzx
eqzx

Reputation: 5559

It seems the primary goal of the question is to have an object that behaves like a dictionary, without having the dictionary's contents in RAM (OP: "By doing it this way, when the dictionary is created, it eats up too much memory."). One option here is to use sqlitedict, which mimics the Python dictionary API, and uses a Sqlite database under the hood.

Here's the example from the current documentation:

>>> # using SqliteDict as context manager works too (RECOMMENDED)
>>> with SqliteDict('./my_db.sqlite') as mydict:  # note no autocommit=True
...     mydict['some_key'] = u"first value"
...     mydict['another_key'] = range(10)
...     mydict.commit()
...     mydict['some_key'] = u"new value"
...     # no explicit commit here
>>> with SqliteDict('./my_db.sqlite') as mydict:  # re-open the same DB
...     print mydict['some_key']  # outputs 'first value', not 'new value'

Upvotes: 2

Felk
Felk

Reputation: 8224

Assuming the problem you are facing is accessing key-value structured data from a csv file, you have 3 options:

  1. Load the entire data into a dictionary, copying it into RAM as a whole, and then have fast, constant access time. This is what you said you want to avoid.
  2. Searching through the data line-by-line every time you want to access data by a key. This does not have any memory overhead, but needs to scan over the entire document each time, having linear access time.
  3. Use or copy the data into some database engine (or any key-value storage), which supports disk-based indexing, allowing for constant time access while not requiring to load the data into memory first.

Upvotes: 1

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476574

You could create a class where you override the __getitem__ method. Like:

class Foo:

    def __getitem__(self,key):
        # ...
        # process the key
        # for example
        return repr(key)

Now if you create a Foo:

>>> somefoo = Foo()
>>> somefoo['bar']
"'bar'"
>>> somefoo[3]
'3'

So syntactically it works "a bit" like a dictionary.

You can also use a generator with send as is demonstrated in this answer:

def bar():
    while True:
        key = yield
        # process the key
        # for example
        yield repr(key)

and call it with:

>>> somebar = bar()
>>> next(somebar)
>>> somebar.send('bar')
"'bar'"
>>> next(somebar)
>>> somebar.send(3)
'3'

Upvotes: 1

Related Questions