user13925399
user13925399

Reputation:

how to explode dynamically using pandas column?

I have a dataframe that looks like this

import pandas as pd
import numpy as np
# Create data set.
dataSet = {'id': ['A', 'A', 'B'],
           'id_2': [1, 2, 1] ,
           'number': [320, 169, 120],
           'add_number' : [4,6,3]}

# Create dataframe with data set and named columns.
df = pd.DataFrame(dataSet, columns= ['id', 'id_2','number', 'add_number'])

    id  id_2    number  add_number
0   A   1        320       4
1   A   2        169       6
2   B   1        120       3

I would like use number and add_number so that I can explode this dynamically, ie) 320 + 4 would have [320,321,322,323,324] (up to 324, and would like to explode on this)

DESIRED OUTPUT

    id  id_2    number
0   A   1        320       
1   A   1        321       
2   A   1        322
3   A   1        323
4   A   1        324
5   A   2        169
6   A   2        170
7   A   2        171
8   A   2        172
9   A   2        173
10  A   2        174
11  A   2        175
12  B   1        120
13  B   1        121
14  B   1        122
15  B   1        123

I looked over explode, wide_to_long pandas function, but I do not know where to start, any sense of direction would be appreciated!!

Upvotes: 6

Views: 264

Answers (4)

Scott Boston
Scott Boston

Reputation: 153460

You try using np.arange and explode:

df['range'] = df.apply(lambda x: np.arange(x['number'], x['number']+x['add_number']+1), axis=1)
df.explode('range')

or

df['range'] = [np.arange(n, n+a+1) for n, a in zip(df['number'],df['add_number'])] 
df.explode('range')

Output:

  id  id_2  number  add_number range
0  A     1     320           4   320
0  A     1     320           4   321
0  A     1     320           4   322
0  A     1     320           4   323
0  A     1     320           4   324
1  A     2     169           6   169
1  A     2     169           6   170
1  A     2     169           6   171
1  A     2     169           6   172
1  A     2     169           6   173
1  A     2     169           6   174
1  A     2     169           6   175
2  B     1     120           3   120
2  B     1     120           3   121
2  B     1     120           3   122
2  B     1     120           3   123

Upvotes: 2

wwnde
wwnde

Reputation: 26676

Multiply columns based od number column

 new=pd.DataFrame(np.repeat(df.values,df['add_number']+1, axis=0), columns=df.columns)

#Use groupby cummulatively count the added rows and increment number by the same

new=new.assign(number=new['number']+new.groupby(['number','id'])['number'].transform('cumcount'))

Outcome

 id id_2 number add_number
0   A    1    320          4
1   A    1    321          4
2   A    1    322          4
3   A    1    323          4
4   A    1    324          4
5   A    2    169          6
6   A    2    170          6
7   A    2    171          6
8   A    2    172          6
9   A    2    173          6
10  A    2    174          6
11  A    2    175          6
12  B    1    120          3
13  B    1    121          3
14  B    1    122          3
15  B    1    123          3

Upvotes: 3

Richard K Yu
Richard K Yu

Reputation: 2202

Here is an alternative approach, in case you wanted to use explode specifically:

import pandas as pd
import numpy as np
# Create data set.
dataSet = {'id': ['A', 'A', 'B'],
           'id_2': [1, 2, 1] ,
           'number': [320, 169, 120],
           'add_number' : [4,6,3]}

df = pd.DataFrame(dataSet, columns= ['id', 'id_2','number', 'add_number'])
new_numbers = []

for row in df.iterrows():
    new_numbers.append([row[1]['number']+ i for i in range(row[1]['add_number']+1)])


df['number'] = new_numbers
df = df.drop(['add_number'], axis=1)

df = df.explode('number').reset_index().drop(['index'], axis=1)
print(df)

Output:

   id  id_2 number
0   A     1    320
1   A     1    321
2   A     1    322
3   A     1    323
4   A     1    324
5   A     2    169
6   A     2    170
7   A     2    171
8   A     2    172
9   A     2    173
10  A     2    174
11  A     2    175
12  B     1    120
13  B     1    121
14  B     1    122
15  B     1    123

Upvotes: 1

BENY
BENY

Reputation: 323226

In your case doing repeat then groupby cumcount add the value to number

out = df.reindex(df.index.repeat(df.add_number+1))
out['number'] = out['number'].add(out.groupby(out.index).cumcount())
out
Out[22]: 
  id  id_2  number  add_number
0  A     1     320           4
0  A     1     321           4
0  A     1     322           4
0  A     1     323           4
0  A     1     324           4
1  A     2     169           6
1  A     2     170           6
1  A     2     171           6
1  A     2     172           6
1  A     2     173           6
1  A     2     174           6
1  A     2     175           6
2  B     1     120           3
2  B     1     121           3
2  B     1     122           3
2  B     1     123           3

Upvotes: 2

Related Questions