Joe Nordling
Joe Nordling

Reputation: 73

Pandas vectorization average columns found in each row

I have a dataframe of the following format:

import pandas as pd
df = pd.DataFrame(
{
"date":["2020/01/01","2020/01/02","2020/01/03","2020/01/04","2020/01/05"], 
"days_before": [["2019/12/31","2019/12/30"], ["2020/01/01", "2019/12/31"], ["2020/01/02","2020/01/01"], ["2020/01/03","2020/01/02"], ["2020/01/04","2020/01/03"]],
"count":[6,3,2,5,7]
})
print(df)
         date               days_before  count
0  2020/01/01  [2019/12/31, 2019/12/30]      6
1  2020/01/02  [2020/01/01, 2019/12/31]      3
2  2020/01/03  [2020/01/02, 2020/01/01]      2
3  2020/01/04  [2020/01/03, 2020/01/02]      5
4  2020/01/05  [2020/01/04, 2020/01/03]      7

I am wanted to essentially add a new column "before_avg" that is the average of count for all the days in the respective days_before column.

I know I can using a .apply function to calculate this column, but I wanted to know how to vectorize this function in order to speed up execution time.

Here is my .apply code:

def calculateAvg(row):
   localDf = df[df["date"].isin(row["days_before"])]["count"]
   return localDf.mean()
df["before_avg"] = df.apply(lambda row: calculateAvg(row), axis=1)
print(df)
         date               days_before  count  avg
0  2020/01/01  [2019/12/31, 2019/12/30]      6  NaN
1  2020/01/02  [2020/01/01, 2019/12/31]      3  6.0
2  2020/01/03  [2020/01/02, 2020/01/01]      2  4.5
3  2020/01/04  [2020/01/03, 2020/01/02]      5  2.5
4  2020/01/05  [2020/01/04, 2020/01/03]      7  3.5

Upvotes: 1

Views: 68

Answers (1)

Laurent
Laurent

Reputation: 13518

With the dataframe you provided, here is one way to it using pandas rolling:

df["avg"] = (
    pd.concat([pd.Series([6]), df["count"]])[:-1]
    .rolling(2)
    .mean()
    .reset_index(drop=True)
)
print(df)
# Output
         date               days_before  count  avg
0  2020/01/01  [2019/12/31, 2019/12/30]      6  NaN
1  2020/01/02  [2020/01/01, 2019/12/31]      3  6.0
2  2020/01/03  [2020/01/02, 2020/01/01]      2  4.5
3  2020/01/04  [2020/01/03, 2020/01/02]      5  2.5
4  2020/01/05  [2020/01/04, 2020/01/03]      7  3.5

Upvotes: 1

Related Questions