daj
daj

Reputation: 7183

How to reference intermediate dataframe when method chaining in pandas?

When chaining dataframe operations in dplyr, it is possible to use operations that anonymously depend on the current dataframe, as a trivial example:

data.frame(x=3) %>% filter(x == 3) %>% mutate(x = x/sum(.$x))

Here I can do an operation on the dataframe itself up to a chained operation by referencing "."

What is the equivalent way to do this in pandas with method chaining? Is it possible without defining intermediate variables?

Upvotes: 2

Views: 761

Answers (2)

Panwen Wang
Panwen Wang

Reputation: 3825

With datar, you can use the f pronoun:

>>> from datar.all import f, tibble, filter, mutate, sum
>>> 
>>> tibble(x=3) >> filter(f.x==3) >> mutate(x=f.x/sum(f.x))
          x
  <float64>
0       1.0

I am the author of the package. Feel free to submit issues if you have any questions.

Upvotes: 0

BENY
BENY

Reputation: 323306

In python

df[df.W01.eq(3)].assign(x=df[df.W01.eq(3)].W02.transform(lambda x : x/sum(x)))
Out[873]: 
   W01  W02         x
0    3    1  0.333333
1    3    1  0.333333
2    3    1  0.333333

Explanation:

df[df.W01.eq(3)] : filter(x == 3)

.assign(x=df[df.W01.eq(3)].W02.transform(lambda x : x/sum(x))) : mutate(x = x/sum(.$x))

Data Input

df = pd.DataFrame({'W01': [3,3,3,2], 'W02': [1,1,1,999]})

Upvotes: 3

Related Questions