Reputation: 494
I have two DataFrames and I would like to find the standard deviation per rc_id for one of the columns i.e. imapcted_users
column in these two dataframes and create a separate column with the name std
with their standard deviation value
df1
:
data = {"timestamp":["2022-10-29","2022-10-29","2022-10-29","2022-10-29","2022-10-29","2022-10-29","2022-10-29","2022-10-29","2022-10-29"],
"rc_id":[296,296,296,296,296,100,100,100,100],
"impacted_users":[1,87,44,8,5,2,7,11,30]}
df1 = pd.DataFrame(data)
df1 = df1.groupby(["timestamp","rc_id"]).agg({"impacted_users": sum}
).reset_index()
df1:
rc_id timestamp impacted_users
296 2022-10-29 145
100 2022-10-29 50
df2
:
data1 = {"rc_id":[296,296,296,100,100,100],
"impacted_users":[201,202,216,300,301,350]}
df2 = pd.DataFrame(data1)
df2
create df2:
rc_id impacted_users
296 201
296 202
296 216
100 300
100 301
100 350
Expected Output:
id timestamp imapcted_users std
296 2022-10-29 11:00:00 145 27.21
100 2022-10-29 11:00:00 50 117.36
What I would like to have is std
and put it as a separate
columns (just as an example what values I am looking for from these columns):
std(145, 201, 202,216)
std (50,300,301,350)
I am unable to come up with a strategy to get this standard dev. for values from different dataframes. I tried to concat the required values and then get the std by aggregation but I guess there is a better way.
Upvotes: 1
Views: 39
Reputation: 863751
IIUC use concat
with aggregate std
, but because pandas Series.std
has default ddof=1
for expected ouput add parameter ddof=0
, last append to df1
:
df1 = df1.groupby(["timestamp","rc_id"], as_index=False, sort=False)["impacted_users"].sum()
df = (df1.join(pd.concat([df1, df2])
.groupby('rc_id')['impacted_users'].std(ddof=0).rename('std'), on='rc_id'))
print (df)
timestamp rc_id impacted_users std
0 2022-10-29 296 145 27.212130
1 2022-10-29 100 50 117.367745
Upvotes: 1