Reputation: 53
Hello all I have this:
df = pd.DataFrame({'name':['L1', 'L2'], 'from':['1', '5'], 'to':['3', '7']})
name from to
L1 1 3
L2 5 7
In fact I have a lot of row (3000000) and very big range like 1 to 9000.
What is the best way to have this kind of result (exploding the range in rows)
like:
name n°
L1 1
L1 2
L1 3
L2 5
L2 6
L2 7
Thank's a lot
Upvotes: 3
Views: 298
Reputation: 5471
Edited: sorry I got the questions wrong 1st time, I corrected by answer
As pandas data are stored as numpy array internally and numpy manipulations are fast you can use numpy manipulation to do it
import pandas as pd
import numpy as np
df = pd.DataFrame({'name':['L1', 'L2'], 'from':[1, 5], 'to':[3, 7]})
t_dict = {}
## find number of times each rows should be repeated
repeats = (df["to"]- df["from"]+1 ).values
## repeat names
t_dict["name"] = df["name"].values.repeat(repeats)
## repeat notes and add 0,1,2,3 .. seq to each repeated section
t_dict["n°"] = df["from"].values.repeat(repeats) + np.array([i for count in repeats for i in range(count)])
pd.DataFrame(t_dict)
output
name n°
0 L1 1
1 L1 2
2 L1 3
3 L2 5
4 L2 6
5 L2 7
Upvotes: 0
Reputation: 863236
Use list comprehension with range
and DataFrame
constructor:
df[['from','to']] = df[['from','to']].astype(int)
zipped = zip(df['name'], df['from'], df['to'])
df = pd.DataFrame([(i, y) for i,j,k in zipped for y in range(j, k+1)], columns=['name','id'])
print (df)
name id
0 L1 1
1 L1 2
2 L1 3
3 L2 5
4 L2 6
5 L2 7
Another solution:
df[['from','to']] = df[['from','to']].astype(int)
s = df['to'] - df['from'] + 1
df = df.loc[df.index.repeat(s), ['name','from']].rename(columns={'from':'no'})
df['no'] += df.groupby(level=0).cumcount()
df = df.reset_index(drop=True)
print (df)
name no
0 L1 1
1 L1 2
2 L1 3
3 L2 5
4 L2 6
5 L2 7
Upvotes: 1
Reputation: 3103
(pd.concat([df, df[['from', 'to']]
.applymap(int)
.apply(lambda x: np.arange(x[0], x[1]), axis=1)],
axis=1)
.explode(0))
Output:
name from to 0
0 L1 1 3 1
0 L1 1 3 2
1 L2 5 7 5
1 L2 5 7 6
Upvotes: 0