Reputation: 413
Say I have a table like so:
| Name | Age |
|--------|-----|
| Bob | 2 |
| John | 3 |
| Tim | 4 |
| Ben | 5 |
| Ella | 4 |
| Sophie | 5 |
| Grace | 6 |
| Bill | 34 |
| Ron | 23 |
| Harry | 2 |
How could I add a new column which selects a random 10% of the rows and adds a new column with True? Then sets the rest to False. Like so?
| Name | Age | |
|--------|-----|-------|
| Bob | 2 | False |
| John | 3 | False |
| Tim | 4 | False |
| Ben | 5 | True |
| Ella | 4 | False |
| Sophie | 5 | False |
| Grace | 6 | False |
| Bill | 34 | False |
| Ron | 23 | False |
| Harry | 2 | False |
Upvotes: 2
Views: 596
Reputation: 1862
You can use pandas' sample function:
df.loc[df.sample(frac=0.1).index, "sample_column"] = True
df["sample_column"] = df["sample_column"].fillna(False)
Upvotes: 2
Reputation: 10960
df['flag'] = df.index.isin(df.sample(frac=0.1, random_state=1).index)
OR
df['flag'] = False
df.loc[df.sample(frac=0.1, random_state=1).index, 'flag'] = True
Sample Output
>>> df
Name Age flag
1 Bob 2.0 False
2 John 3.0 False
3 Tim 4.0 True
4 Ben 5.0 False
5 Ella 4.0 False
6 Sophie 5.0 False
7 Grace 6.0 False
8 Bill 34.0 False
9 Ron 23.0 False
10 Harry 2.0 False
Upvotes: 0