Normalize values according to category in pandas

Question

My dataframe looks like this

#    pubdate  lang Count
# 0  20140619 en   3
# 1  20150308 en   1
# 2  20140207 en   1
# 3  20180319 en   1
# 4  20150223 en   1

I would like to normalize the count by replacing it with itself divided by the average count/year.

For example, here is how the table looks at the end

#    pubdate  lang Count
# 0  20140619 en   1.5
# 1  20150308 en   1
# 2  20140207 en   0.5
# 3  20180319 en   1
# 4  20150223 en   1

So for year 2014, the average of all counts was (3+1)/(2 rows of 2014) = 2, then each value was divided by it.

I thought about first duplicating the frame, having a year column, then grouping per year, then changing values in first table according to that.

I do not know how to do it in code

yatu · Accepted Answer

Here's one slicing on the year substring and using it as a grouper, then divide the Count by the mean with transform to preserve the shape of the dataframe:

df['Count'] /= df.groupby(df.pubdate.astype(str).str[:4]).Count.transform('mean')

print(df)

    pubdate lang  Count
0  20140619   en    1.5
1  20150308   en    1.0
2  20140207   en    0.5
3  20180319   en    1.0
4  20150223   en    1.0

Normalize values according to category in pandas

Answers (2)

Related Questions