Reputation: 37
I have a Pandas dataframe that looks like this:
Location | Number | Position |
---|---|---|
New York | 111 | 1 through 12 |
Dallas | 222 | 1 through 12 |
San Francisco | 333 | 1 through 4 |
I would like to basically explode the dataframe based on the Position column so that the output looks like this:
Location | Number | Position |
---|---|---|
New York | 111 | 1 |
New York | 111 | 2 |
New York | 111 | 3 |
New York | 111 | 4 |
New York | 111 | 5 |
New York | 111 | 6 |
New York | 111 | 7 |
New York | 111 | 8 |
New York | 111 | 9 |
New York | 111 | 10 |
New York | 111 | 11 |
New York | 111 | 12 |
Dallas | 222 | etc. etc. |
I would like to do this for every instance of Location and Number based on what the values in Position are.
Is there a quick and easy way to do this??
Upvotes: 0
Views: 206
Reputation: 260410
One option using explode
:
out = (df
.assign(Position=[range(a, b+1) for x in df['Position']
for a,b in [map(int, x.split(' through '))]])
.explode('Position')
)
Another approach using reindexing:
df2 = df['Position'].str.extract('(\d+) through (\d+)').astype(int)
# 0 1
# 0 1 12
# 1 1 12
# 2 1 4
rep = df2[1].sub(df2[0]).add(1)
out = (df
.loc[df.index.repeat(rep)]
.assign(Position=lambda d: d.groupby(level=0).cumcount().add(df2[0]))
)
output:
Location Number Position
0 New York 111 1
0 New York 111 2
0 New York 111 3
0 New York 111 4
0 New York 111 5
0 New York 111 6
0 New York 111 7
0 New York 111 8
0 New York 111 9
0 New York 111 10
0 New York 111 11
0 New York 111 12
1 Dallas 222 1
1 Dallas 222 2
1 Dallas 222 3
1 Dallas 222 4
1 Dallas 222 5
1 Dallas 222 6
1 Dallas 222 7
1 Dallas 222 8
1 Dallas 222 9
1 Dallas 222 10
1 Dallas 222 11
1 Dallas 222 12
2 San Francisco 333 1
2 San Francisco 333 2
2 San Francisco 333 3
2 San Francisco 333 4
Upvotes: 1