Reputation: 526
I have been using shelve to store a ton of objects in this manner:
dictionary with strings as the key and a list as the value:
i.e.
data["MITL"] = ["Radio And Television Broadcasting And Communications Equipment", "Communication Equipment"]
or more succinctly:
...
SIXH.L Machine Tools & Accessories,
GOPAIST.BO Steel & Iron,
HERITGFOO.NS Food Wholesale,
MITL Radio And Television Broadcasting And Communications Equipment, Communication Equipment,
MMLP Oil Refining, Marketing, Oil & Gas Pipelines,
SESL.PA Diversified Electronics,
...
<≈ 30,000 entries>
I extracted from this .db file and exported to another .db file so the industries are the keys and the list consists of the stock symbols.
...
Industrial Electrical Equipment ['PLPC', 'MAG', 'LPTH', 'IIN', 'CUI', 'ULBI', 'APWC', 'CAPC', 'SVT', 'ARTX', 'CPST', 'OSIS', 'LGL', 'BW', 'HPJ', 'AOS', 'FLUX', 'AMSC', 'GTI', 'RTBC', 'AUSI', 'AETI', 'AIMC', 'HYGS', 'BLDP', 'HOLI', 'NPWZ', 'LIME', 'ESNC', 'ZBB', 'CSTU', 'AXPW', 'GBLL', 'EMR', 'BDC', 'BNSO', 'ENS', 'REFR', 'ABAT', 'FELE', 'CYLU', 'XIDEQ', 'LYTS', 'GAI', 'AMOT', 'CUI.V', 'LSCG']
Toy & Hobby Stores ['BBW']
Distribution ['MNST', 'FMX', 'STZ', 'FIZZ', 'BREW', 'THST', 'LBIX', 'ROX', 'COKE', 'KOF', 'PEP', 'COT', 'REED', 'SAM', 'MGPI', 'DPS', 'CCE', 'BORN', 'KO', 'BUD', 'CCU', 'WVVIP', 'TAP', 'WVVI', 'DEO', 'ABEV', 'VCO']
Home Health Care ['AFAM', 'SCAI', 'ADUS', 'AMED', 'LHCG', 'BIOS', 'CHE', 'HASC']
...
<≈ 300 entries>
The file writes fine as far as I can tell, it's retrieving the data that is my issue.
From the documentation: "The database is also (unfortunately) subject to the limitations of dbm, if it is used — this means that (the pickled representation of) the objects stored in the database should be fairly small, and in rare cases key collisions may cause the database to refuse updates."
But I can't find any information on the limitations of dbm, even with the documentation. The reason must be because the lists that I'm storing as values are too large.
Here's a code excerpt:
industriesAndTheirStocks = shelve.open("industriesAndTheirStocks")
print(len(industriesAndTheirStocks)) # just to make a point at how many keys there are, proving it's the size of the lists stored that contains the issue
for industry in industriesAndTheirStocks: # fails here because 'industriesAndTheirStocks' can't be iterated through, because it sent a negative number as the size to __iter__
print("{:<15}".format(industry), end="")
print(industriesAndTheirStocks[industry])
and the error/output:
374
Traceback (most recent call last):
File "read_from_shelve_stock_industry_file.py", line 144, in <module>
if __name__ == "__main__":main()
File "read_from_shelve_stock_industry_file.py", line 128, in main
display_shelve_contents_by_industry()
File "read_from_shelve_stock_industry_file.py", line 42, in display_shelve_contents_by_industry
for industry in industriesAndTheirStocks:
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/shelve.py", line 95, in __iter__
for k in self.dict.keys():
SystemError: Negative size passed to PyBytes_FromStringAndSize
Process finished with exit code 1
I have seen other people have problems that result in the same error, but they were using Python versions before 7.4.1 and I think their error was of a different cause. Python shelve module question
So then, my questions:
What are the limitations of dbm?
Is there a way to fix having large objects (dictionaries that contain large lists as the value) in shelve?
If not, what's a better way to store the data if I don't want to keep it in RAM? (which is the purpose of using Shelve I think)
Upvotes: 5
Views: 662
Reputation: 141
From my Perl days the limit of the key size and value size for a dbm file was, I think, 1024 bytes. I think I limited the data/value to 1000 bytes and I was fine. If the Python program crashes then you have exceeded the size.
Upvotes: 0