Reputation:
I am new to python and pandas, and I was wondering if there was a 'pythonic' way to accomplish the following: I have a dataframe that looks like this:
L1 L2 L3
X 1 50
X 2 100
Z 1 15
X 3 200
Z 2 10
Y 1 1
Z 3 20
Y 2 10
Y 3 100
And I am trying to order the rows and create an additional column that showscumulative values derived from L3 in ascending order. The output I need is the following:
L1 L2 L3 New
X 3 200 0.40000
X 2 100 0.60000
X 1 200 1.00000
Y 3 100 0.90090
Y 2 10 0.99099
Y 1 1 1.00000
Z 3 20 0.44444
Z 1 15 0.77778
Z 2 10 1.00000
The value in row 1 (0.4000) under "New" represents 200/500 (the sum of al L3 values for L1). The second value (0.6000) is simply 300/500 and so on. The 'loop' is repeated for each value of X, Y and Z.
Can anybody help with this? Thank you.
Upvotes: 2
Views: 529
Reputation: 5137
As stated in this post, the solution will only work with version 0.13 of Pandas. For the current version (0.12), the solution is the following:
In [20]: new_column = df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
In [21]: df["new"] = new_column.reset_index(level=0, drop=True)
Upvotes: 1
Reputation: 139182
You can do it with the following line of code:
df.groupby("L1", as_index=False).apply(lambda x : pd.expanding_sum(x.sort("L3", ascending=False)["L3"])/x["L3"].sum())
Some explanation:
df.groupby("L1", as_index=False)
does group the dataframe by column L1
, so the following calculation is done for each value (X, Y and Z).apply()
applies the function to each of this groups:
pd.expanding_sum(x.sort("L3", ascending=False)["L3"])
takes the cumulative sum of column "L3" but first sorted down by the values in "L3".../x["L3"].sum()
and then divides this by the sum of all values of "L3" in that group.This gives:
In [9]: df["new"] = df.groupby("L1", as_index=False).apply(lambda x : pd.expanding_sum(x.sort("L3", ascending=False)["L3"])/x["L3"].sum())
In [10]: df
Out[10]:
L1 L2 L3 new
0 X 1 200 0.800000
1 X 2 100 1.000000
2 Z 1 15 0.777778
3 X 3 200 0.400000
4 Z 2 10 1.000000
5 Y 1 1 1.000000
6 Z 3 20 0.444444
7 Y 2 10 0.990991
8 Y 3 100 0.900901
or sorted:
In [16]: df.sort(["L1", "L3"], ascending=[True, False])
Out[16]:
L1 L2 L3 new
0 X 1 200 0.800000
3 X 3 200 0.400000
1 X 2 100 1.000000
8 Y 3 100 0.900901
7 Y 2 10 0.990991
5 Y 1 1 1.000000
6 Z 3 20 0.444444
2 Z 1 15 0.777778
4 Z 2 10 1.000000
Upvotes: 3