Tatik
Tatik

Reputation: 1227

How to unfold lists in DataFrame columns?

I have the following DataFrame df:

center    status     devices
1     Green      [d1, d2]
1     Green      [d5, d1, d2]
2     Green      []
3     Green      [d5, d6]

I need to unfold lists in the column devices. The goal is to group data by center and device, and then count the number of observations per group.

The expected result would be the following one:

center   device   count
1        d1       2
1        d2       2
1        d5       1
3        d5       1
3        d6       1

Upvotes: 3

Views: 823

Answers (2)

BENY
BENY

Reputation: 323376

Using unnesting after filter the empty list out then groupby size

unnesting(df[df.devices.astype(bool)],['devices']).groupby(['center','devices']).size().reset_index(name='count')
Out[214]: 
   center devices  count
0       1      d1      2
1       1      d2      2
2       1      d5      1
3       3      d5      1
4       3      d6      1

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')

Upvotes: 2

jezrael
jezrael

Reputation: 863611

First flatten lists and then aggregate by DataFrameGroupBy.size:

#create Series
s = df.set_index('center')['devices']
#create DataFrame, reshape by stack and conver MultiIndex to columns
df = pd.DataFrame(s.values.tolist(), index=s.index).stack().reset_index()
df.columns= ['center','i','devices']

#aggregate count
df = df.groupby(['center','devices']).size().reset_index(name='count')
print (df)
   center device  count
0       1     d1      2
1       1     d2      2
2       1     d5      1
3       3     d5      1
4       3     d6      1

Another solution for improve performance:

from itertools import chain

df = pd.DataFrame({
    'devices' : list(chain.from_iterable(df['devices'].tolist())), 
    'center' : df['center'].values.repeat(df['devices'].str.len())
})


df = df.groupby(['center','devices']).size().reset_index(name='count')
print (df)
   center devices  count
0       1      d1      2
1       1      d2      2
2       1      d5      1
3       3      d5      1
4       3      d6      1

Upvotes: 3

Related Questions