How to unfold lists in DataFrame columns?

Question

I have the following DataFrame df:

center    status     devices
1     Green      [d1, d2]
1     Green      [d5, d1, d2]
2     Green      []
3     Green      [d5, d6]

I need to unfold lists in the column devices. The goal is to group data by center and device, and then count the number of observations per group.

The expected result would be the following one:

center   device   count
1        d1       2
1        d2       2
1        d5       1
3        d5       1
3        d6       1

jezrael · Accepted Answer

First flatten lists and then aggregate by DataFrameGroupBy.size:

#create Series
s = df.set_index('center')['devices']
#create DataFrame, reshape by stack and conver MultiIndex to columns
df = pd.DataFrame(s.values.tolist(), index=s.index).stack().reset_index()
df.columns= ['center','i','devices']

#aggregate count
df = df.groupby(['center','devices']).size().reset_index(name='count')
print (df)
   center device  count
0       1     d1      2
1       1     d2      2
2       1     d5      1
3       3     d5      1
4       3     d6      1

Another solution for improve performance:

from itertools import chain

df = pd.DataFrame({
    'devices' : list(chain.from_iterable(df['devices'].tolist())), 
    'center' : df['center'].values.repeat(df['devices'].str.len())
})


df = df.groupby(['center','devices']).size().reset_index(name='count')
print (df)
   center devices  count
0       1      d1      2
1       1      d2      2
2       1      d5      1
3       3      d5      1
4       3      d6      1

How to unfold lists in DataFrame columns?

Answers (2)

Related Questions