noamchomsky
noamchomsky

Reputation: 103

How to make the rest of a DataFrame column take on the value of a function output?

Say I've got a dataframe with a column of numbers. I am using df.apply() to modify this column, and the function I am using takes as an argument the number in this column, which means that the output of the function depends on the "state" of the column at the time it is applied. The function has to know the value of the number in row (n-1) in order to spit out the number for row n.

How can I make it so that the function knows what it's most recent output was, seeing as this most recent output is one of it's arguments it needs to generate the number of the next row of the dataframe? My idea was to have the output of the function set as the value of not just the row being iterated on, but also all the rows below it. How can I do this? Is there an easier way that I am not seeing?

Upvotes: 1

Views: 188

Answers (1)

Roy2012
Roy2012

Reputation: 12503

I can think of (at least) two ways to achieve what you're looking for. The first one is using apply with a stateful operator, like this:

df = pd.DataFrame({"a": range(0, 10), "b": range(10, 20)})

class StatefulOp: 
    def __init__(self):
        self._last = 0

    def __call__(self, num):
        res = self._last + num
        self._last = res
        return res

op = StatefulOp()

df.a.apply(op)

The result is:

0     0
1     1
2     3
3     6
4    10
5    15
6    21
7    28
8    36
9    45

The second way is to avoid using apply in the first place, but rather use iterrows (or some other way to iterate over the rows in the data frame). For example:

last_val = 0

res_array = []
for row in df.iterrows():
    res = last_val + row[1]["a"]
    last_val = res
    res_array.append(res)

df["new_a"] = res_array
print(df)

The result is:

   a   b  new_a
0  0  10      0
1  1  11      1
2  2  12      3
3  3  13      6
4  4  14     10
5  5  15     15
6  6  16     21
7  7  17     28
8  8  18     36
9  9  19     45

Upvotes: 1

Related Questions