Reputation: 35
I have a pandas dataframe with a variable that is an array of arrays. I would like to create a new dataframe from this variable.
My current dataframe 'fruits' looks like this...
Id Name Color price_trend
1 apple red [['1420848000','1.25'],['1440201600','1.35'],['1443830400','1.52']]
2 lemon yellow [['1403740800','0.32'],['1422057600','0.25']]
What I would like is a new dataframe from the 'price_trend' column that looks like this...
Id date price
1 1420848000 1.25
1 1440201600 1.35
1 1443830400 1.52
2 1403740800 0.32
2 1422057600 0.25
Thanks for the advice!
Upvotes: 2
Views: 215
Reputation: 1041
A groupby+apply should do the trick.
def f(group):
row = group.irow(0)
ids = [row['Id'] for v in row['price_trend']]
dates = [v[0] for v in row['price_trend']]
prices = [v[1] for v in row['price_trend']]
return DataFrame({'Id':ids, 'date': dates, 'price': prices})
In[7]: df.groupby('Id', group_keys=False).apply(f)
Out[7]:
Id date price
0 1 1420848000 1.25
1 1 1440201600 1.35
2 1 1443830400 1.52
0 2 1403740800 0.32
1 2 1422057600 0.25
Edit:
To filter out bad data (for instance, a price_trend
column having value [['None']]
), one option is to use pandas boolean indexing.
criterion = df['price_trend'].map(lambda x: len(x) > 0 and all(len(pair) == 2 for pair in x))
df[criterion].groupby('Id', group_keys=False).apply(f)
Upvotes: 1