CowboyCoder
CowboyCoder

Reputation: 103

Sample Case Numbers for Respective IDs

I have a pandas dataframe of IDs that looks something like this:

id = pd.DataFrame([[1,3],[2,2]], columns = ['ID','# of places worked'])
ID # of places worked
1 3
2 2

I also have a pandas dataframe of cases that looks something like this:

cases = pd.DataFrame(
[[1,123],[1,345],[1,456],[1,789],[1,132],[2,133],[2,143],[2,465],[2,765]], 
columns = ['ID','Case ID'])
ID Case ID
1 123
1 345
1 456
1 789
1 132
2 133
2 143
2 465
2 765

I want to randomly sample 3 case IDs for all the IDs in id. This would be the ideal output:

ID Case ID
1 456
1 789
1 132
2 143
2 465
2 765

Any help would be greatly appreciated.

Upvotes: 0

Views: 45

Answers (1)

Swifty
Swifty

Reputation: 3419

You can get a x sized sample of a dataframe with df.sample(n = x); so, to get what you want, you can do this for each sub-dataframe where ID = ..., and concat the results:

pd.concat([cases[cases['ID'] == x].sample(n=3) for x in list(id['ID'])])

Upvotes: 1

Related Questions