Daniel Yang
Daniel Yang

Reputation: 43

python data structure suggestion?

I have a complex pricing data, like a tree structure

Example will be like computers price (monitor price, motherboard price etc.) and in monitors category I have more sub-category, and under those sub-categories I have more categories (monitor which is 27 inch, which made by dell, which is curved)

I need to frequently read these pricing information (read only) like thousands of times.

I want to use class to store these information. Because I don't know if I can do it in dictionaries. Anyone have a suggestion?

Upvotes: 1

Views: 88

Answers (3)

Nickpick
Nickpick

Reputation: 6597

Mongodb is definitely a good possibility but in your case with only 50 entries and only to read it's probably an overkill, especially as you will need to get familiar with how to do the queries.

A quicker way is most likely via pandas: use a nested dictionary, best create an input JSON file or string (as in the example below) and then read it in do a pandas dataframe.

You can then normalize it in the way you want it and do the calculations necessary in pandas, which you can learn much quicker:

Here an example of how it could look like: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html

>>> data = [{'state': 'Florida',
...          'shortname': 'FL',
...          'info': {
...               'governor': 'Rick Scott'
...          },
...          'counties': [{'name': 'Dade', 'population': 12345},
...                      {'name': 'Broward', 'population': 40000},
...                      {'name': 'Palm Beach', 'population': 60000}]},
...         {'state': 'Ohio',
...          'shortname': 'OH',
...          'info': {
...               'governor': 'John Kasich'
...          },
...          'counties': [{'name': 'Summit', 'population': 1234},
...                       {'name': 'Cuyahoga', 'population': 1337}]}]
>>> from pandas.io.json import json_normalize
>>> result = json_normalize(data, 'counties', ['state', 'shortname',
...                                           ['info', 'governor']])
>>> result
         name  population info.governor    state shortname
0        Dade       12345    Rick Scott  Florida        FL
1     Broward       40000    Rick Scott  Florida        FL
2  Palm Beach       60000    Rick Scott  Florida        FL
3      Summit        1234   John Kasich     Ohio        OH
4    Cuyahoga        1337   John Kasich     Ohio        OH

For the above dataframe you can easily get the sum of the population for all entries with shortname=='FL' as follows:

sum_of_fl_population = result[result['shortname']=='FL'].population.sum()
Out[11]: 112345

Have a look at this link to get an introduction how to handle the pandas dataframes. It's probably the best way to solve your problem. http://pandas.pydata.org/pandas-docs/stable/10min.html

Upvotes: 1

Loïc
Loïc

Reputation: 11942

Go for big data.

Put ElasticSearch, MongoDB, CouchDB or many others (I guess) in your stack and store every product as a document, flattening your base.

Create an index per type of document (one for computer screens, one for motherboards, ...), basically one index per type of object that share the same properties and you are ready to go.

For elasticSearch for instance about parenting : https://www.elastic.co/guide/en/elasticsearch/guide/2.x/parent-child.html https://www.elastic.co/guide/en/elasticsearch/guide/2.x/parent-child-mapping.html

I believe this is faster than SQL, and maybe easier to use. There is one downside though, not many frameworks allow you to use such "new" databases. Though for the better known there are mapper with the ORMs

Upvotes: 0

chatton
chatton

Reputation: 596

I think a good way of storing this data could be to have a Computer Object, which could have member variables such Motherboard, Monitor etc.

You could then have a single dictionary which contained a unique id mapped to a Computer object. So you could have something along the lines of

    computers.get('12345').getMotherBoard().getMake()
    computers.get('45678').getMonitor().getScreenResolution()
    computers.get('54342').getRam().getPrice()

Upvotes: 0

Related Questions