Reputation: 61
I'm using DataFrame.explode() to unnest a column of lists such that each element gets its own row. What I'm wondering is how to create a new 'index' column that will correspond to the index of the element in the original list. In the example I'm calling this column 'cycle'.
I've hopefully attached an image below of what I'm trying to achieve. If there is already a page on this please share. Thank you!
Upvotes: 5
Views: 3100
Reputation: 120
First, create your cycle column:
import pandas as pd
df = pd.DataFrame([
["A",[2,3,5,2,3,6,8]],
["B",[97,83,5]],
["C",[2.5,6,7,9]]
],
columns=["sample", "value"])
df["cycle"]=df["value"].str.len().apply(lambda x: list(range(0,x)))
Then you can simply explode both of the columns simultaneously. This option is available since the latest pandas version. (check the documentation here)
df.explode(["value","cycle"])
sample value cycle
0 A 2 0
0 A 3 1
0 A 5 2
0 A 2 3
0 A 3 4
0 A 6 5
0 A 8 6
1 B 97 0
1 B 83 1
1 B 5 2
2 C 2.5 0
2 C 6 1
2 C 7 2
2 C 9 3
Upvotes: 1
Reputation: 35626
Use groupby cumcount
after explode
:
df = df.explode('value').reset_index(drop=True)
df['cycle'] = df.groupby('sample').cumcount() # Enumerate Groups
df = df[['sample', 'cycle', 'value']] # Reorder columns
or with insert
:
df = df.explode('value').reset_index(drop=True)
df.insert(1, 'cycle', df.groupby('sample').cumcount()) # Insert New Column
or with assign
:
df = (
df.explode('value')
.assign(cycle=lambda x: x.groupby(level=0).cumcount())
.reset_index(drop=True)[['sample', 'cycle', 'value']]
)
df
:
sample cycle value
0 A 0 2
1 A 1 3
2 A 2 5
3 A 3 2
4 A 4 3
5 A 5 6
6 A 6 8
7 B 0 97
8 B 1 83
9 B 2 5
10 C 0 2.5
11 C 1 6
12 C 2 7
13 C 3 9
Upvotes: 4