jacob
jacob

Reputation: 820

Pandas groupby then assign

I have a dataframe in long format with columns: date, ticker, mcap, rank_mcap. The mcap columns is "marketcap" and measure how large a certain stock is, and mcap_rank is simply the ranked verson of it (where 1 is the largest marketcap).

I want to create a top 10 market cap weighted asset (e.g. S&P10). In R I do this

df %>%
    filter(day(date) == 1, rank_mcap < 11) %>%
    group_by(date) %>%
    mutate(weight = mcap / sum(mcap)) %>%
    ungroup() %>%

What do I do in pandas? I get the following error

AttributeError: Cannot access callable attribute 'assign' of 'DataFrameGroupBy' objects, try using the 'apply' method

when I tro do to a similar approach like the R method, namely in python do this:

df.\
    query('included == True & date.dt.day == 1'). \
    groupby('date').\
    assign(w=df.mcap / df.mcap.sum())

I studied http://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html and did not come to a conclusion.

Upvotes: 0

Views: 231

Answers (2)

Panwen Wang
Panwen Wang

Reputation: 3835

You can do it in the same way as you did in R using datar:

from datar.all import f, filter, group_by, ungroup, mutate, sum

df >> \
    filter(f.date.day == 1, f.rank_mcap < 11) >> \
    group_by(f.date) >> \
    mutate(weight = f.mcap / sum(f.mcap)) >> \
    ungroup() 

Disclaimer: I am the author of the datar package.

Upvotes: 0

BENY
BENY

Reputation: 323316

How pandas achieve Mutate in R

df.query('included == True & date.dt.day == 1').\
    assign(weight = lambda x : x.groupby('date',group_keys=False).
           apply(lambda y: y.mcap / y.mcap.sum()))

Upvotes: 1

Related Questions