Reputation: 2269
I have a dataframe of race results (where each race has 14 participants) that looks like this:
df = race_id A0 B0 C0 A1 B1 C1 A2 B2 C2 ... A13 B13 C13 WINNER
1 2 3 0 9 1 3 4 5 1 1 2 3 3
2 1 5 2 7 3 2 8 6 0 6 4 1 9
.....
I want to train the data on a multi logistic regression model. However, as the data currently stands, the model would be sensitive to permuting the participants. For example, if the model is given the record
race_id A0 B0 C0 A1 B1 C1 A2 B2 C2 ... A13 B13 C13 WINNER
3 9 1 3 2 3 0 4 5 1 1 2 3 3
Which is just changing participant 0 features into participant 1 features in race 1, the model would output a different prediction for the winner even though the input is the same.
So I want to generate a random 100 permutations for each race in the data with the same winner to train the model to adapt on permutations. How can I create these 100 sample permutations for this data frame (While preserving the A,B,C features of every racer?
Upvotes: 2
Views: 314
Reputation: 10490
Here's an option for the filling your dataframe with triplets permutations, where df
is your dataframe (I left out the winner column mapping; see chunkwise
implementation).
Note that rand_row
is just a random row I made for the sake of example. It's filled with values from 1 to 10 (as in your given dataframe), and have 40 columns (1 for race id, plus 13*3 for each racer), but you can change it, of course:
import random
import itertools
def chunkwise(t, size=2):
it = iter(t)
return zip(*[it]*size)
def fill(df, size):
rand_row = [random.randrange(1, 10) for _ in range(0, 13*3)]
triplets = list(chunkwise(rand_row, 3))
for i in range(size):
shuffeled = random.sample(triplets, len(triplets))
flattened = [item for triplet in shuffeled for item in triplet]
df.loc[i] = [i+1] + flattened
return df;
Upvotes: 0
Reputation: 2049
Before we begin, this is not a good approach to modeling race outcomes.
However, if you want to do it anyway, you want to permute and remap the column names and then union together the resulting permutations. First, dymanically create a list of participants by parsing the column names:
participants = [col[1:] for col in df.columns if col.startswith('A')]
Then loop through permutations of these participants and apply the column name remapping:
import itertools
# Create an empty dataframe to hold our permuted races
races = pd.DataFrame()
for permutation in list(itertools.permutations(participants)):
# Create the mapping of participants from the permutation
mapping = {p:permutation[i] for i, p in enumerate(participants)}
# From the participant mapping, create a column mapping
columns = {}
for col in df.columns:
for old, new in mapping.items():
if col.endswith(old):
columns[col] = col.replace(old, new)
# Remap column names
race = df.rename(columns=columns)
# Reassign the winner based on the mapping
race['WINNER'] = race.apply(lambda row: mapping[row['WINNER']], axis=1)
# Collect the races
races = pd.concat([races, race])
Upvotes: 1