Reputation: 4828
Simple example:
from collections import namedtuple
import pandas
Price = namedtuple('Price', 'ticker date price')
a = Price('GE', '2010-01-01', 30.00)
b = Price('GE', '2010-01-02', 31.00)
l = [a, b]
df = pandas.DataFrame.from_records(l, index='ticker')
Traceback (most recent call last)
...
KeyError: 'ticker'
Harder example:
df2 = pandas.DataFrame.from_records(l, index=['ticker', 'date'])
df2
0 1 2
ticker GE 2010-01-01 30
date GE 2010-01-02 31
Now it thinks that ['ticker', 'date']
is the index itself, rather than the columns I want to use as the index.
Is there a way to do this without resorting to an intermediate numpy ndarray or using set_index
after the fact?
Upvotes: 28
Views: 60588
Reputation: 23041
Calling the DataFrame constructor on the list of namedtuples produce a dataframe:
df = pd.DataFrame(l)
ticker date price
0 GE 2010-01-01 30.0
1 GE 2010-01-02 31.0
Calling set_index()
on the result produces the desired output. However, since OP doesn't want that, another way could be to convert each namedtuple into a dictionary and pop keys.
l_asdict = [x._asdict() for x in l]
df = pd.DataFrame(l_asdict, index=pd.MultiIndex.from_arrays([[x.pop(k) for x in l_asdict] for k in ['ticker', 'date']], names=['ticker', 'date']))
price
ticker date
GE 2010-01-01 30.0
2010-01-02 31.0
Upvotes: 0
Reputation: 375415
To get a Series from a namedtuple you could use the _fields
attribute:
In [11]: pd.Series(a, a._fields)
Out[11]:
ticker GE
date 2010-01-01
price 30
dtype: object
Similarly you can create a DataFrame like this:
In [12]: df = pd.DataFrame(l, columns=l[0]._fields)
In [13]: df
Out[13]:
ticker date price
0 GE 2010-01-01 30
1 GE 2010-01-02 31
You have to set_index
after the fact, but you can do this inplace
:
In [14]: df.set_index(['ticker', 'date'], inplace=True)
In [15]: df
Out[15]:
price
ticker date
GE 2010-01-01 30
2010-01-02 31
Upvotes: 31