d3pd
d3pd

Reputation: 8315

What would be a good Python data structure that can store lists of data that all have a common index and some minor metadata?

I have a set of lists, each for storing certain characteristics of events. I also have an index that corresponds to event numbers. So, for a particular event number say, index = 123, I could look at the elements of the various lists at that index (e.g. event_color[123]) to look at the character of an event. I want to collect these lists in some object. I could also attach a simple metadata object like a dictionary to the object.

What would be a good type of object for this type of data?

Here's the beginnings of an idea:

data = {}
data["color"] = ["red", "green", "blue"]
data["mass"]  = [100, 98, 90]
data["speed"] = [10, 11, 9]
data["metadata"] = {"event_type": "2015-12-11T1442Z"}

Perhaps the object could be told which event number to use and then the various current characteristics could be requested of it.


EDIT: Following a suggestion by gkusner, I created the following data structure class:

class Data(object):

    def __init__(
        self
        ):
        self._index = 0
        self._data  = {}

    def index(
        number = None
    ):
        if number is not None:
            self._index = number
        return self._index

    def indices(
        self
    ):
        return [index for index in self._data]

    def variable(
        self,
        index = None,
        name  = None,
        value = None
    ):
        if index is not None:
            self._index = index
        if name is not None:
            if value is not None:
                try:
                    self._data[self._index][name] = value
                except:
                    self._data[self._index] = {}
                    self._data[self._index][name] = value
        return self._data[self._index][name]

    def variables(
        self,
        index = 0
    ):
        return [
            variable for variable, value in self._data[self._index].iteritems()
        ]

Upvotes: 0

Views: 1235

Answers (2)

Pablo Arnalte-Mur
Pablo Arnalte-Mur

Reputation: 510

A possible solution would be to use the pandas library, and in particular a DataFrame object. If you are not familiar with that library, here is a short introductory tutorial. The library has lots of features that may be useful (for example, to deal with date/time data). Whether it is useful for you depends on the algorithms you want to use (as suggested by Peter Wood's comment).

For your short example, you could build the data object as a DataFrame as

import pandas as pd
data = pd.DataFrame({'color': ['red', 'green', 'blue'],
                     'mass': [100, 98, 90],
                     'speed': [10,11,9]})

Then you can access either the full data object or particular elements in it like, e.g.

>> print data
   color  mass  speed
0    red   100     10
1  green    98     11
2   blue    90      9
>> data.loc[1, 'mass']
98
>> data.loc[0, 'color']
'red'

You could also perform operations on the columns and save the results as a new column in your data object, e.g.:

>>data['momentum'] = data['mass']*data['speed']
>>print data
   color  mass  speed  momentum
0    red   100     10      1000
1  green    98     11      1078
2   blue    90      9       810
>>data.loc[2, 'momentum']
810

What I am not sure is about the metadata bit you want. I understand this would be some metadata for the whole object (not a particular event). I don't know of an easy way to add 'global metadata' to a DataFrame, but you could maybe add an extra column with the information (even if it is the same for all the events). In your example:

data = pd.DataFrame({'color': ['red', 'green', 'blue'],
                     'mass': [100, 98, 90],
                     'speed': [10,11,9],
                     'event_type': "2015-12-11T1442Z"})

which results in

>>print data
   color        event_type  mass  speed
0    red  2015-12-11T1442Z   100     10
1  green  2015-12-11T1442Z    98     11
2   blue  2015-12-11T1442Z    90      9

Upvotes: 0

gkusner
gkusner

Reputation: 1244

a dictionary (specifically a nested dictionary):

data = {}
index = 123
data[index] = {}
data[index]["color"] = ["red", "green", "blue"]
data[index]["mass"]  = [100, 98, 90]
data[index]["speed"] = [10, 11, 9]
data[index]["metadata"] = {"event_type": "2015-12-11T1442Z"}

notice index is not quoted

you could also define it as a class with each index value defining an instance but that might be over-kill for your needs

Upvotes: 2

Related Questions