Nicolas Gervais
Nicolas Gervais

Reputation: 36694

How do I assign a value to a random subset of a Pandas DataFrame?

I'm trying to select a random subset of a pd.DataFrame and set a value to a certain column. Here's a toy example:

import pandas as pd

df = pd.DataFrame({
    'species': ['platypus', 'monkey', 'possum'],
    'name': ['mike', 'paul', 'doug'],
    'group': ['control', 'control', 'control']
})
    species  name    group
0  platypus  mike  control
1    monkey  paul  control
2    possum  doug  control

I tried the follow, to randomly assign two people to the experimental group, but it won't work:

df.sample(2)['group'] = 'experimental'

This won't work either, in fact:

df.iloc[[0, 1]]['group'] = 'experimental'

Upvotes: 1

Views: 603

Answers (3)

inquirer
inquirer

Reputation: 4823

df['group'].iloc[[0, 1]] = 'experimental'

Output

    species  name         group
0  platypus  mike  experimental
1    monkey  paul  experimental
2    possum  doug       control

Upvotes: 0

High-Octane
High-Octane

Reputation: 1112

Here is something that picks random indexes, random number of times.

import pandas as pd
import random

def custom_randomizer(df, col):
    
    total_randoms = random.choice(df.index) + 1
    for _ in range(total_randoms):
        df.loc[random.choice(df.index), col] = 'expiremental'
    
    return df
    
df = pd.DataFrame({
    'species': ['platypus', 'monkey', 'possum'],
    'name': ['mike', 'paul', 'doug'],
    'group': ['control', 'control', 'control']
})

df = custom_randomizer(df, 'group')

print(df)

Upvotes: 0

Emi OB
Emi OB

Reputation: 3299

You can use df.sample(2).index to get the indexes in your df of the randomly sampled data, you can then pass this into .loc to set the group column for those indexes to be 'experimental' as below:

df.loc[df.sample(2).index, 'group'] = 'experimental'

Output:

    species  name         group
0  platypus  mike  experimental
1    monkey  paul  experimental
2    possum  doug       control

Upvotes: 3

Related Questions