ReverseEngineer
ReverseEngineer

Reputation: 559

Pandas groupby and retain another column's value

I have a dataframe that looks like this

Site  Sector  Model  Frequency  Bandwidth
A       1     abc      900        500M
A       2     abc      900        1000M
A       3     abc      900        1000M
A       1     abc      1000       2000M
A       2     abc      1000       3000M
A       3     abc      1000       3000M
B       1     def      900        150M
B       2     def      900        220M
B       3     def      900        300M
B       1     def      1000       450M
B       2     def      1000       600M
B       3     def      1000       1000M

and I've concatenate the Frequency and Bandwidth using groupby and agg function.

grouped_df = ndb_df.groupby(by=['Site', 'Sector'], as_index=False).agg({'Frequency': lambda x: '#'.join(x.astype(str)), 'Bandwidth': lambda y: '#'.join(y.astype(str))})

which result in the below dataframe output

Site  Sector  Frequency  Bandwidth
A       1     900#1000    500M#2000M
A       2     900#1000    1000M#3000M
A       3     900#1000    1000M#3000M
B       1     900#1000    150M#450M
B       2     900#1000    220M#600M
B       3     900#1000    300M#1000M

The output grouped_df removed the column Model which I wish to retain. Any idea on this?

What I really want

Site  Sector  Model  Frequency  Bandwidth
A       1     abc    900#1000    500M#2000M
A       2     abc    900#1000    1000M#3000M
A       3     abc    900#1000    1000M#3000M
B       1     def    900#1000    150M#450M
B       2     def    900#1000    220M#600M
B       3     def    900#1000    300M#1000M

Upvotes: 1

Views: 53

Answers (1)

jezrael
jezrael

Reputation: 862431

First idea is add Model column to by parameter if same groups like Site:

grouped_df = (ndb_df.groupby(by=['Site', 'Sector', 'Model'], as_index=False)
                    .agg({'Frequency': lambda x: '#'.join(x.astype(str)), 
                         'Bandwidth': lambda y: '#'.join(y.astype(str))})

Or add this column to dictionary with GroupBy.first function:

grouped_df = (ndb_df.groupby(by=['Site', 'Sector'], as_index=False)
                     .agg({'Model':'first',
                           'Frequency': lambda x: '#'.join(x.astype(str)), 
                            'Bandwidth': lambda y: '#'.join(y.astype(str))
                           }))
print (grouped_df)
  Site  Sector Model Frequency    Bandwidth
0    A       1   abc  900#1000   500M#2000M
1    A       2   abc  900#1000  1000M#3000M
2    A       3   abc  900#1000  1000M#3000M
3    B       1   def  900#1000    150M#450M
4    B       2   def  900#1000    220M#600M
5    B       3   def  900#1000   300M#1000M

Also is possible assign lambda function to variable like:

f = lambda x: '#'.join(x.astype(str))
grouped_df = (ndb_df.groupby(by=['Site', 'Sector'], as_index=False)
                     .agg({'Model':'first',
                           'Frequency': f, 
                            'Bandwidth': f})
              )

Upvotes: 1

Related Questions