Reputation: 133
I have a DataFrame like so:
column1 column2 column3
0 a 2 2
1 b 1 0
2 c 3 2
Where the value in column3 is necessarily <= the value in column2
I want to expand df to below based on the following rules:
column1 column2 column3
0 a 1 1
1 a 1 1
2 b 1 0
3 c 1 1
4 c 1 1
5 c 1 0
For example: See that the row with column1=a was expanded into 2 rows because its column2 value was equal to 2, and both the resulting expansion rows have a value of 1 in column3 because 0 < 2 and 1 < 2.
See that the row with column1=c was expanded into 3 rows because its column2 value was equal to 3, however, only the first 2 resulting expansion rows have a value of 1 in column3 (again: 0 < 2 and 1 < 2), however, the third expanded row gets a value of 0 for column3 because (yeah, yeah, yeah, this isn't kindergarten math) it is not true that 2 < 2.
What function can I use on a dataframe like the one first given, to get a result that looks like the second one?
Upvotes: 2
Views: 103
Reputation: 323276
What I think, this is a merge problem
s1=df.loc[df.index.repeat(df.column2),['column1']].assign(column2=1).reset_index(drop=True)
s2=df.loc[df.index.repeat(df.column3),['column1']].assign(column3=1).reset_index(drop=True)
df=s1.assign(Key=s1.groupby(s1.column1).cumcount()).merge(s2.assign(Key=s2.groupby(s2.column1).cumcount()),how='left').fillna(0)
df
column1 column2 Key column3
0 a 1 0 1.0
1 a 1 1 1.0
2 b 1 0 0.0
3 c 1 0 1.0
4 c 1 1 1.0
5 c 1 2 0.0
Upvotes: 1
Reputation: 51155
np.repeat
+ cumcount
u = pd.DataFrame(np.repeat(df.values, df.column2, axis=0), columns=df.columns)
u.assign(
column2=1,
column3=(u.column3 > u.groupby('column1').cumcount()).astype(int)
)
column1 column2 column3
0 a 1 1
1 a 1 1
2 b 1 0
3 c 1 1
4 c 1 1
5 c 1 0
Upvotes: 2