Reputation: 63737
Consider the following program wherein, I created a multi-index dataframe with three columns and eventually populated one column with a nested list of tuple of lists. I the flattened the indexes and tried to iterate over the rows ix, rec = next(df.iterrows())
.
I then de-referenced data column rec.data
from the iterated row (rec
), and found out it was a memory object <memory at 0x000000000D6E0AC8>
. On calling the obj attributed on the record rec.data.obj
, I realised it is an array with the content of the entire row. To get to the actual content, I have to fetch the item index which is quite non-intuitive.
>>> print(rec.data.obj[2])
[(['9', '"', 'X', '12', '"'], 0.9993008259451988)]
Sample Recreatable Example
def foo():
return [(['9', '"', 'X', '12', '"'], 0.99930082594519876)]
import pandas as pd
def spam():
index = pd.MultiIndex(levels=[[], []],
labels=[[], []],
names=[u'timestamp', u'key'])
columns = ['data', 'col1', 'col2']
df = pd.DataFrame(index=index, columns=columns)
for ix in range(4):
key = ('XXX', ix)
df.loc[key, 'data'] = str(foo())
df.loc[key, 'col1'] = "col1_{}".format(ix)
df.loc[key, 'col2'] = "col2_{}".format(ix)
df.reset_index(inplace=True)
return df
def bar():
df = spam()
ix, rec = next(df.iterrows())
print(rec.data)
print(rec.data.obj)
print(rec.data.obj[2])
bar()
Output
<memory at 0x000000000D6E0AC8>
['XXX' 0 '[([\'9\', \'"\', \'X\', \'12\', \'"\'], 0.9993008259451988)]'
'col1_0' 'col2_0']
[(['9', '"', 'X', '12', '"'], 0.9993008259451988)]
I am clueless and cannot understand, what am I missing
Upvotes: 0
Views: 421
Reputation: 862771
It seems you need itertuples
:
def bar():
df = spam()
rec = next(df.itertuples())
print (rec)
print (rec.data)
bar()
Pandas(Index=0, timestamp='XXX',
key=0,
data='[([\'9\', \'"\', \'X\', \'12\', \'"\'], 0.9993008259451988)]',
col1='col1_0',
col2='col2_0')
[(['9', '"', 'X', '12', '"'], 0.9993008259451988)]
Upvotes: 1