Pandas groupby where all columns are added to a list prefixed by column name

Question

I have a CSV file that looks like this:

id1,feat1,feat2,feat3
a,b,asd,asg
c,d,dg,ag
a,e,sdg,as
c,f,as,sdg
c,g,adg,sd

I read it in to a dataframe with df = pd.read_csv("file.csv").

I would like group by id1 and combine all the other columns in one line in the group with the header name added as a prefix. That is the output should be a data frame that looks like:

a [feat1=b,feat1=e,feat2=asd,feat2=sdg,feat3=asg,feat3=as]
c [feat1=d,feat1=f,feat1=g,feat2=dg,feat2=as,feat2=adg,feat3=ag,feat3=sdg,feat3=sd]

df.groupby('id1') will start me off but I am not sure where to go from there.

What is a good way to do this?

EdChum · Accepted Answer

You can use a custom function and apply on the groupby object, the function calls apply again on the Series passed to zip the column names and values into a list, we then perform a list comprehension and return this inside a list as desired:

In [54]:    
def foo(x):
    l = (x.apply(lambda x: x.name + '=' + x)).values.tolist()
    return pd.Series([[i for j in l for i in j]])

gp = df.groupby('id1')[['feat1','feat2','feat3']]
gp1 = gp.apply(foo)
gp1

Out[54]:
                                                     0
id1                                                   
a    [feat1=b, feat2=asd, feat3=asg, feat1=e, feat2...
c    [feat1=d, feat2=dg, feat3=ag, feat1=f, feat2=a...

if we look at the contents we see that we have a list of the values:

In [55]:    
gp1.iloc[0].values

Out[55]:
array([['feat1=b', 'feat2=asd', 'feat3=asg', 'feat1=e', 'feat2=sdg', 'feat3=as']], dtype=object)

Pandas groupby where all columns are added to a list prefixed by column name

Answers (2)

Related Questions