Create new column filled with random elements based on a categorical column

Question

I have a pandas dataframe that looks like this:

Column ID has unique integers, while Cat contains categorical variables. Now I would like to add two new columns with conditions about Cat.

The desirable result should look like this:

ID  Cat  New1   New2
87    A    67    36
56    A    67    76
67    A    56    36
76    D    36    56
36    D    76    67

Column New1: for each row, pick a random ID with the SAME category as the current row ID, with replacements. The randomly picked ID should not be the same as the current row ID.

Column New2: for each row, pick a random ID with a DIFFERENT category than the current row ID, with replacements.

How can I do this efficiently?

run-out · Accepted Answer

I tried to find a solution using vectors but was unable. This solution iterates through the index and calculates new values for New1 and New2.

This will achieve the result I believe you are looking for.

for i in df.index:
    # Grab the category variable for each row.
    cat = df.loc[i,'Cat']

    # Set column New1
    mask1 = df['Cat'] == cat
    mask2 = df.index != i
    df.at[i,'New1']= df[mask1 & mask2]["ID"].sample().iloc[0]

    # Set column New2
    mask3 = df['Cat'] != cat
    df.at[i,'New2']= df[mask3]["ID"].sample().iloc[0]

print(df) 1st one:

 ID Cat  New1  New2
0  87   A  56.0  76.0
1  56   A  87.0  36.0
2  67   A  56.0  76.0
3  76   D  36.0  87.0
4  36   D  76.0  87.0

print(df) 2nd one:

  ID Cat  New1  New2
0  87   A  67.0  36.0
1  56   A  87.0  36.0
2  67   A  87.0  76.0
3  76   D  36.0  67.0
4  36   D  76.0  67.0

You can see from these result you are getting random results through the use of sample().

Create new column filled with random elements based on a categorical column

Answers (2)

Related Questions