JPV
JPV

Reputation: 1079

Lambda function to use in dataframe

I have the following vector

3 
5
6
7
4
6
7 
8

And I would like to implement a lambda function that given a vector element i , computes the mean value of i-3 ,i-2 i-1 and ith element. But I do not know how can I access the i-3, i-2, i-1 elements in the lambda function.

Upvotes: 0

Views: 2063

Answers (2)

Schmuddi
Schmuddi

Reputation: 2086

You can use the rolling() method to access the elements of a Pandas series within a specified window. Then, you can use a lambda function to calculate the mean for the elements in that window. In order to include the three elements to the left of the current element, you use a window size of 4:

In [39]: import pandas as pd

In [40]: S = pd.Series([3, 5, 6, 7, 4, 6, 7, 8])

In [41]: S.rolling(4).apply(lambda x: pd.np.mean(x))
Out[41]: 
0     NaN
1     NaN
2     NaN
3    5.25
4    5.50
5    5.75
6    6.00
7    6.25
dtype: float64

You'll note that there are missing values for the first three elements. This is so because you can only start to form a window of the size 4 from the fourth element onwards. However, if you want to calculate with smaller windows for the first elements, you can use the argument min_periods to specify the smallest valid window size:

In [42]: S.rolling(4, min_periods=1).apply(lambda x: pd.np.mean(x))
Out[42]: 
0    3.000000
1    4.000000
2    4.666667
3    5.250000
4    5.500000
5    5.750000
6    6.000000
7    6.250000
dtype: float64

Having said that, you don't need the lambda in the first place – I included it only because you explicitly asked for lambdas. The method rolling() creates a Rolling object that has a built-in mean function that you can use, like so:

In [43]: S.rolling(4).mean()
Out[43]: 
0     NaN
1     NaN
2     NaN
3    5.25
4    5.50
5    5.75
6    6.00
7    6.25
dtype: float64

Upvotes: 3

Dan Temkin
Dan Temkin

Reputation: 1605

if you want to do it on a pandas dataframe the easiest way is to use .loc, assuming you know the index position of i.

 import pandas as pd

 df = pd.DataFrame([3, 5, 6, 7, 4, 6, 7 ,8])
 setx = lambda x: df.loc[x:x-3:-1].mean()
 # x is the index position of your target value.
 > setx(4) # Without mean() gives values [4, 7, 6, 5]
 >> 5.5

Although if you want to stick with PEP8 standards it is best to define a function and avoid lambda in cases where (see python.org/dev/peps/pep-0008/#id50), assigning functions to an identifier by means of a lambda expression that is advised against in PEP8. Thank you @Schmuddi for the clarification.

Upvotes: 2

Related Questions