Reputation: 2333
I have a dataframe df that looks like this:
ID1 ID2 Bool Count
0 12868123 387DB71C 0 1
1 12868123 84C0E502 1 11
2 12868123 387DB71C 1 1
8 12868123 80A9DCFC 0 16
9 12868123 7A260136 1 20
10 12868123 80A9DCFC 0 16
11 12868123 80BB4591 0 36
327295 8617B7D9 76A08B0E 0 19
327296 8617B7D9 76A08B0E 0 19
327297 8617B7D9 76D0DA26 1 2
327298 8617B7D9 7C92B2A6 1 3
327299 8617B7D9 75883296 1 1
327300 8617B7D9 78711A4F 0 12
327301 8617B7D9 78711A4F 0 12
327302 8617B7D9 78711A4F 0 12
I want to do two things:
1- I want to "randomly" extract n
unique rows for each (ID1, Bool)
instance.
So if n = 2
, one possible result could be:
ID1 ID2 Bool Count
0 12868123 387DB71C 0 1
8 12868123 80A9DCFC 0 16
1 12868123 84C0E502 1 11
2 12868123 387DB71C 1 1
327295 8617B7D9 76A08B0E 0 19
327296 8617B7D9 76A08B0E 0 19
327297 8617B7D9 76D0DA26 1 2
327298 8617B7D9 7C92B2A6 1 3
I tried looking for something along the line of df.groupby('ID1', 'Bool').random(size=n), but couldn't figure it out.
2- I then want to calculate the average Count
for each (ID1, Bool)
pair. So that the final resulting DF is:
ID1 Bool AverageCount
0 12868123 0 8.5
1 12868123 1 6
2 8617B7D9 0 19
3 8617B7D9 1 2.5
I think I have the second part figured out:
df.groupby(['ID1','Bool'])['Count'].mean()
Upvotes: 2
Views: 636
Reputation: 862561
You can use groupby
with numpy.random.choice
:
n = 2
df1 = df.groupby(['ID1', 'Bool'])['Count'] \
.apply(lambda x: np.mean(np.random.choice(x, n))) \
.reset_index(name='AverageCount')
print (df1)
ID1 Bool AverageCount
0 12868123 0 18.5
1 12868123 1 6.0
2 8617B7D9 0 19.0
3 8617B7D9 1 3.0
Upvotes: 3
Reputation: 294228
groupby
+ sample
df.groupby(
['ID1', 'Bool']
).apply(
lambda df: df.sample(2).Count.mean()
).reset_index(name='AverageCount')
Upvotes: 3