Sean Clarke
Sean Clarke

Reputation: 133

row expansion by column while manipulating other row based on value

I have a DataFrame like so:

  column1  column2  column3
0       a        2        2
1       b        1        0
2       c        3        2

Where the value in column3 is necessarily <= the value in column2

I want to expand df to below based on the following rules:

  column1  column2  column3
0       a        1        1
1       a        1        1
2       b        1        0
3       c        1        1
4       c        1        1
5       c        1        0
  1. every row is expanded into a number of rows equal to its value in column2
  2. the value of column3 for expanded rows is equal to 1 if its index (relative to it's column1 grouping) is less than the original unexpanded row's value in column3.

For example: See that the row with column1=a was expanded into 2 rows because its column2 value was equal to 2, and both the resulting expansion rows have a value of 1 in column3 because 0 < 2 and 1 < 2.

See that the row with column1=c was expanded into 3 rows because its column2 value was equal to 3, however, only the first 2 resulting expansion rows have a value of 1 in column3 (again: 0 < 2 and 1 < 2), however, the third expanded row gets a value of 0 for column3 because (yeah, yeah, yeah, this isn't kindergarten math) it is not true that 2 < 2.

What function can I use on a dataframe like the one first given, to get a result that looks like the second one?

Upvotes: 2

Views: 103

Answers (2)

BENY
BENY

Reputation: 323276

What I think, this is a merge problem

s1=df.loc[df.index.repeat(df.column2),['column1']].assign(column2=1).reset_index(drop=True)
s2=df.loc[df.index.repeat(df.column3),['column1']].assign(column3=1).reset_index(drop=True)
df=s1.assign(Key=s1.groupby(s1.column1).cumcount()).merge(s2.assign(Key=s2.groupby(s2.column1).cumcount()),how='left').fillna(0)
df

  column1  column2  Key  column3
0       a        1    0      1.0
1       a        1    1      1.0
2       b        1    0      0.0
3       c        1    0      1.0
4       c        1    1      1.0
5       c        1    2      0.0

Upvotes: 1

user3483203
user3483203

Reputation: 51155

np.repeat + cumcount

u = pd.DataFrame(np.repeat(df.values, df.column2, axis=0), columns=df.columns)

u.assign(
  column2=1,
  column3=(u.column3 > u.groupby('column1').cumcount()).astype(int)
)

  column1  column2  column3
0       a        1        1
1       a        1        1
2       b        1        0
3       c        1        1
4       c        1        1
5       c        1        0

Upvotes: 2

Related Questions