user3123958
user3123958

Reputation:

New cumulative value col derived from existing col in a Pandas dataframe

I am new to python and pandas, and I was wondering if there was a 'pythonic' way to accomplish the following: I have a dataframe that looks like this:

L1  L2  L3
X   1   50
X   2   100
Z   1   15
X   3   200
Z   2   10
Y   1   1
Z   3   20
Y   2   10
Y   3   100

And I am trying to order the rows and create an additional column that showscumulative values derived from L3 in ascending order. The output I need is the following:

L1  L2  L3  New
X   3   200 0.40000
X   2   100 0.60000
X   1   200 1.00000
Y   3   100 0.90090
Y   2   10  0.99099
Y   1   1   1.00000
Z   3   20  0.44444
Z   1   15  0.77778
Z   2   10  1.00000

The value in row 1 (0.4000) under "New" represents 200/500 (the sum of al L3 values for L1). The second value (0.6000) is simply 300/500 and so on. The 'loop' is repeated for each value of X, Y and Z.

Can anybody help with this? Thank you.

Upvotes: 2

Views: 529

Answers (2)

Luis Miguel
Luis Miguel

Reputation: 5137

As stated in this post, the solution will only work with version 0.13 of Pandas. For the current version (0.12), the solution is the following:

In [20]: new_column = df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
In [21]: df["new"] = new_column.reset_index(level=0, drop=True)

Upvotes: 1

joris
joris

Reputation: 139182

You can do it with the following line of code:

df.groupby("L1", as_index=False).apply(lambda x : pd.expanding_sum(x.sort("L3", ascending=False)["L3"])/x["L3"].sum())

Some explanation:

  • df.groupby("L1", as_index=False) does group the dataframe by column L1, so the following calculation is done for each value (X, Y and Z)
  • .apply() applies the function to each of this groups:
    • pd.expanding_sum(x.sort("L3", ascending=False)["L3"]) takes the cumulative sum of column "L3" but first sorted down by the values in "L3"
    • .../x["L3"].sum() and then divides this by the sum of all values of "L3" in that group.

This gives:

In [9]: df["new"] = df.groupby("L1", as_index=False).apply(lambda x : pd.expanding_sum(x.sort("L3", ascending=False)["L3"])/x["L3"].sum())

In [10]: df
Out[10]: 
  L1  L2   L3       new
0  X   1  200  0.800000
1  X   2  100  1.000000
2  Z   1   15  0.777778
3  X   3  200  0.400000
4  Z   2   10  1.000000
5  Y   1    1  1.000000
6  Z   3   20  0.444444
7  Y   2   10  0.990991
8  Y   3  100  0.900901

or sorted:

In [16]: df.sort(["L1", "L3"], ascending=[True, False])
Out[16]: 
  L1  L2   L3       new
0  X   1  200  0.800000
3  X   3  200  0.400000
1  X   2  100  1.000000
8  Y   3  100  0.900901
7  Y   2   10  0.990991
5  Y   1    1  1.000000
6  Z   3   20  0.444444
2  Z   1   15  0.777778
4  Z   2   10  1.000000

Upvotes: 3

Related Questions