Reputation: 91
Imagine a Students/Grades dataframe such that
Using pandas, how can I create multiple groups such that each group has 1 student with an A, 2 students with Bs, and 1 student with C?
I've tried using pandas' GroupBy['Grade'] and then Sample from each grade-group. The problem with this is that it gives me the same number of students from each grade-group, however, I'd like a specific number of students from each specific grade-group.
The solution shouldn't care about the "left overs". If I have a fully formed set that follows the required distribution, I'd be happy.
Thanks for any help,
Upvotes: 1
Views: 44
Reputation: 679
You can do that by using a dictionary to store the number of samples from each group, as shown below:
import pandas as pd
import numpy as np
# create the dataframe
df = pd.DataFrame(zip(['Person'+ str(i+1) for i in range(30)],
np.random.choice(['A','B', 'C'], 30, replace=True)),
columns = ['Student','Grade'])
# use a dict to store the sample frequencies
sample_freq = {'A':1, 'B':2, 'C':3}
# group by desired variable
groups = df.groupby('Grade')
# sample from each group and concatenate them to a single data frame
pd.concat(
[group_df.sample(sample_freq[group]) for group,group_df in groups])
Upvotes: 2