PeCaDe
PeCaDe

Reputation: 406

diff in different periods and variable

I would like to create a function to transform some specific features in a df with the pandas method .diff in the different indicated periods.

I got it in a two step mode, but I am sure this can be one liner, iow, it can be simpler.

Given the following df:

df = pd.DataFrame({"Category":["A"]*10+["B"]*50+["C"]*15+["D"]*100,
             "foo":[np.random.random_sample() for i in range(175)],
             "bar":[np.random.random_sample() for i in range(175)]})

So by using a dict I can set the colnames and define the diff level I want to perform:

dict_diff={"foo":[1,2],"bar":[3,4]}

In order to set the names and transform at same time, the code I developed is:

pd.concat(map(lambda dict_items: df[dict_items[0]].diff(periods=dict_items[1]).rename(f"{dict_items[0]}_diff{dict_items[1]}"), dict_diff.items()),axis=1)

Whats missing/wrong?

I can not iterate over the value list. As dict_items[1] is a list, there is something I need to do.

As a result:

I would get a df with the new additional columns foo_diff1,..., as indicated in the dict.

Upvotes: 1

Views: 163

Answers (1)

mikele
mikele

Reputation: 196

You just need list comprehesion and some reduce function in a manner that you can concat with pandas:

import functools
import operator
 
def functools_reduce(a):
    return functools.reduce(operator.concat, a)

pd.concat(functools_reduce(map(lambda dict_items: [df[dict_items[0]].diff(periods=diff_value).rename(f"{dict_items[0]}_diff{diff_value}") for diff_value in dict_items[1]], {"foo":[1,2],"bar":[3,4]}.items())),axis=1)

Upvotes: 2

Related Questions