Splitting array values in dataframe into new dataframe - python

Question

I have a pandas dataframe with a variable that is an array of arrays. I would like to create a new dataframe from this variable.

My current dataframe 'fruits' looks like this...

Id  Name    Color    price_trend
1   apple   red      [['1420848000','1.25'],['1440201600','1.35'],['1443830400','1.52']]
2   lemon   yellow   [['1403740800','0.32'],['1422057600','0.25']]

What I would like is a new dataframe from the 'price_trend' column that looks like this...

Id    date         price
1     1420848000   1.25
1     1440201600   1.35
1     1443830400   1.52
2     1403740800   0.32
2     1422057600   0.25

Thanks for the advice!

Aaron B · Accepted Answer

A groupby+apply should do the trick.

def f(group):
    row = group.irow(0)
    ids = [row['Id'] for v in row['price_trend']]
    dates = [v[0] for v in row['price_trend']]
    prices = [v[1] for v in row['price_trend']]
    return DataFrame({'Id':ids, 'date': dates, 'price': prices})

In[7]: df.groupby('Id', group_keys=False).apply(f)
Out[7]:
   Id        date price
0   1  1420848000  1.25
1   1  1440201600  1.35
2   1  1443830400  1.52
0   2  1403740800  0.32
1   2  1422057600  0.25

Edit:

To filter out bad data (for instance, a price_trend column having value [['None']]), one option is to use pandas boolean indexing.

 criterion = df['price_trend'].map(lambda x: len(x) > 0 and all(len(pair) == 2 for pair in x))
 df[criterion].groupby('Id', group_keys=False).apply(f)

Splitting array values in dataframe into new dataframe - python

Answers (1)

Related Questions