Reputation: 21
I have a dataframe in which each row is a group that climbed a mountain. Each row mentions the group size (Attempted) and how many climbers successfully reached the summit (Succeeded):
d = {'Attempted': [11, 12, 2, 5], 'Succeeded': [0, 6, 2, 3]}
df = pd.DataFrame(data=d)
df
However, I would like to change the dataframe in such a way that each row represents an individual instead of a group. I also want to add a column to indicate if this person reached the summit (1) or not (0). So for each row (i.e. group), I would need to create as many replicates as there are people in that group (i.e. the Attempted column). Looking at the second row for example, six of the twelve new columns would need to get a 0 and 6 need to get a 1 in the new column. For the second row, it would look like this:
d = {'Attempted': [12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12], 'Succeeded': [6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], 'Summit' : [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1}
df = pd.DataFrame(data=d)
df
I managed to create the new column and the replicates for each row. I set them to 1 for each row:
df_new = df.loc[df.index.repeat(df["Attempted"])].assign(ind_summit = 1).reset_index(drop = True)
However, I have no idea how to replace some of these 1 with 0 depending on the amount of people who did not manage to make it to the summit.
Upvotes: 1
Views: 274
Reputation: 6583
You can use something like:
import pandas as pd
d = {'Attempted': [11, 12, 2, 5], 'Succeeded': [0, 6, 2, 3]}
df = pd.DataFrame(data=d)
df_new = df.loc[df.index.repeat(df["Attempted"])]
df_new["Summit"] = df_new.groupby(df_new.index).cumcount().lt(df_new["Succeeded"]).astype(int)
out = df_new.reset_index(drop=True)
out:
Attempted Succeeded Summit
0 11 0 0
1 11 0 0
2 11 0 0
3 11 0 0
4 11 0 0
5 11 0 0
6 11 0 0
7 11 0 0
8 11 0 0
9 11 0 0
10 11 0 0
11 12 6 1
12 12 6 1
13 12 6 1
14 12 6 1
15 12 6 1
16 12 6 1
17 12 6 0
18 12 6 0
19 12 6 0
20 12 6 0
21 12 6 0
22 12 6 0
23 2 2 1
24 2 2 1
25 5 3 1
26 5 3 1
27 5 3 1
28 5 3 0
29 5 3 0
Upvotes: 1