Reputation: 707
I have a small dataframe. An example is given below.
+-------+---------+---------+------+----------+
| code | surplus | deficit | gone | has come |
+-------+---------+---------+------+----------+
| 0100 | 1000 | 0 | | |
| 0103 | 0 | 100 | | |
| 0104 | 0 | 600 | | |
| 0190 | 0 | 0 | | |
| 0191 | 0 | 800 | | |
| 0192 | 500 | 0 | | |
| 0193 | 700 | 0 | | |
| 0194 | 0 | 300 | | |
| 0195 | 0 | 0 | | |
+-------+---------+---------+------+----------+
I need to distribute the data from the column surplus
so as to cover the data in the column deficit
. In the column gone
write down how much I took from the column surplus
, and in the column has come
write down how much I added to fill the deficit. From the column surplus
I take the data as the values decrease. And always start with the largest deficit.
In my example it will look like this:
To cover the biggest deficit (In the example this is 800
) i take part of value 1000
.
To cover the next deficit (In the example this is 600
) i take the remainder of the value 1000
and I take 400
from the value 700
.
To cover the deficit 300
i take the remainder of the value 700
.
And, finally, to cover the deficit 100
i take part of value 500
.
The result should be the next dataframe:
+------+---------+---------+------+----------+
| code | surplus | deficit | gone | has come |
+------+---------+---------+------+----------+
| 0100 | 1000 | 0 | 1000 | 0 |
| 0103 | 0 | 100 | 0 | 100 |
| 0104 | 0 | 600 | 0 | 600 |
| 0190 | 0 | 0 | 0 | 0 |
| 0191 | 0 | 800 | 0 | 800 |
| 0192 | 500 | 0 | 100 | 0 |
| 0193 | 700 | 0 | 700 | 0 |
| 0194 | 0 | 300 | 0 | 300 |
| 0195 | 0 | 0 | 0 | 0 |
+------+---------+---------+------+----------+
The values in the columns surplus
and deficit
can be different.
I can't come up an algorithm for this task. I would be grateful for any ideas.
Upvotes: 1
Views: 1786
Reputation: 9998
I'm afraid I don't know pandas so I can't give the detail of the answer, but here is a general alogrithm that I think would work fine. It's up to you to match this against pandas API.
For the "has come" column:
max(cumsum(deficit), sum(surplus))
For the "has come" column (if sum(surplus) >= sum(deficit)
):
For the "gone" column: Just do exactly the same stuff as above, reversing "deficit" and "surplus".
Edit: In your example the gone column is the tricky one, because in that case sum(deficit) < sum(surplus). Here's the above procedure on surplus.
sum(surplus) = 2200
sum(deficit) = 1800
+------+---------+---------+-----------------+-----------------------------------+----------------+
| code | surplus | deficit | cumsum(surplus) | max(cumsum(surplus),sum(deficit)) | diff(prev row) |
+------+---------+---------+-----------------+-----------------------------------+----------------+
| NaN | 0 | 0 | 0 | 0 | NaN |
| 0100 | 1000 | 0 | 1000 | 1000 | 1000 |
| 0193 | 700 | 0 | 1700 | 1700 | 700 |
| 0192 | 500 | 0 | 2200 | 1800 | 100 |
| 0191 | 0 | 800 | 2200 | 1800 | 0 |
| 0103 | 0 | 100 | 2200 | 1800 | 0 |
| 0104 | 0 | 600 | 2200 | 1800 | 0 |
| 0190 | 0 | 0 | 2200 | 1800 | 0 |
| 0194 | 0 | 300 | 2200 | 1800 | 0 |
| 0195 | 0 | 0 | 2200 | 1800 | 0 |
+------+---------+---------+-----------------+-----------------------------------+----------------+
The final column is the result you want. Note that I added a dummy row to the start so that I could calculate pairwise differences. It turns out the shift()
is the key method you need for calculating that column; see this question
Edit 2: I thought it might be worth adding an alternative solution. It's a bit harder to understand, but it might be a bit easier to implement because you don't need to fiddle with an extra dummy row.
i
(with i=Inf
if no such row exists).df[:i]
), set "has come" = "deficit"df[i+1:]
), set "has come" = 0df[i]
, if i
exists), set "has come" to:
has come = sum(surplus) - (cumsum(deficit) - deficit)
(cumsum(deficit) - deficit)
is actually equal to cumsum(deficit) of the previous row, or 0 if this is the first row.)Upvotes: 2
Reputation: 76
For the question "How to distribute column values to another columns based on condition in pandas?" maybe the function np.where is what you are looking for:
import numpy as np
import pandas as pd
# df[Column title] = np.where ( condition on this line, if condition true then value to assign, else value to assign)
df["gone"] = np.where((df["surplus"] - df["deficit"]) > 0 , df["surplus"] - df["deficit"] , 0)
df["has come"] = np.where((df["surplus"] - df["deficit"]) < 0 , 0, df["deficit"] - df["surplus"] )
Upvotes: 1