AspiringMat
AspiringMat

Reputation: 2269

Generate new rows using permutations of other rows in Pandas

I have a dataframe of race results (where each race has 14 participants) that looks like this:

df = race_id A0 B0 C0 A1 B1 C1 A2 B2 C2 ... A13 B13 C13 WINNER
       1     2   3 0  9  1   3  4  5 1       1   2   3   3
       2     1   5 2  7  3   2  8  6 0       6   4   1   9
       .....

I want to train the data on a multi logistic regression model. However, as the data currently stands, the model would be sensitive to permuting the participants. For example, if the model is given the record

race_id A0 B0 C0 A1 B1 C1 A2 B2 C2 ... A13 B13 C13 WINNER
3       9  1   3  2  3 0  4  5 1       1   2   3   3

Which is just changing participant 0 features into participant 1 features in race 1, the model would output a different prediction for the winner even though the input is the same.

So I want to generate a random 100 permutations for each race in the data with the same winner to train the model to adapt on permutations. How can I create these 100 sample permutations for this data frame (While preserving the A,B,C features of every racer?

Upvotes: 2

Views: 314

Answers (2)

OfirD
OfirD

Reputation: 10490

Here's an option for the filling your dataframe with triplets permutations, where df is your dataframe (I left out the winner column mapping; see chunkwise implementation).

Note that rand_row is just a random row I made for the sake of example. It's filled with values from 1 to 10 (as in your given dataframe), and have 40 columns (1 for race id, plus 13*3 for each racer), but you can change it, of course:

import random
import itertools

def chunkwise(t, size=2):
    it = iter(t)
    return zip(*[it]*size)

def fill(df, size):
    rand_row = [random.randrange(1, 10) for _ in range(0, 13*3)]
    triplets = list(chunkwise(rand_row, 3))
    for i in range(size):
        shuffeled = random.sample(triplets, len(triplets))
        flattened = [item for triplet in shuffeled for item in triplet]
        df.loc[i] = [i+1] + flattened
    return df;

Upvotes: 0

Dave
Dave

Reputation: 2049

Before we begin, this is not a good approach to modeling race outcomes.

However, if you want to do it anyway, you want to permute and remap the column names and then union together the resulting permutations. First, dymanically create a list of participants by parsing the column names:

participants = [col[1:] for col in df.columns if col.startswith('A')]

Then loop through permutations of these participants and apply the column name remapping:

import itertools


# Create an empty dataframe to hold our permuted races
races = pd.DataFrame()
for permutation in list(itertools.permutations(participants)):

  # Create the mapping of participants from the permutation
  mapping = {p:permutation[i] for i, p in enumerate(participants)}

  # From the participant mapping, create a column mapping
  columns = {}
  for col in df.columns:
    for old, new in mapping.items():
      if col.endswith(old):
        columns[col] = col.replace(old, new)

  # Remap column names
  race = df.rename(columns=columns)

  # Reassign the winner based on the mapping
  race['WINNER'] = race.apply(lambda row: mapping[row['WINNER']], axis=1)

  # Collect the races
  races = pd.concat([races, race])

Upvotes: 1

Related Questions