Reputation: 103
I have a pandas dataframe of IDs that looks something like this:
id = pd.DataFrame([[1,3],[2,2]], columns = ['ID','# of places worked'])
ID | # of places worked |
---|---|
1 | 3 |
2 | 2 |
I also have a pandas dataframe of cases that looks something like this:
cases = pd.DataFrame(
[[1,123],[1,345],[1,456],[1,789],[1,132],[2,133],[2,143],[2,465],[2,765]],
columns = ['ID','Case ID'])
ID | Case ID |
---|---|
1 | 123 |
1 | 345 |
1 | 456 |
1 | 789 |
1 | 132 |
2 | 133 |
2 | 143 |
2 | 465 |
2 | 765 |
I want to randomly sample 3 case IDs for all the IDs in id. This would be the ideal output:
ID | Case ID |
---|---|
1 | 456 |
1 | 789 |
1 | 132 |
2 | 143 |
2 | 465 |
2 | 765 |
Any help would be greatly appreciated.
Upvotes: 0
Views: 45
Reputation: 3419
You can get a x sized sample of a dataframe with df.sample(n = x); so, to get what you want, you can do this for each sub-dataframe where ID = ..., and concat the results:
pd.concat([cases[cases['ID'] == x].sample(n=3) for x in list(id['ID'])])
Upvotes: 1