tinok
tinok

Reputation: 23

Pandas Implode and Create 2 New Column based on list value

I have pandas data frame like this

Name   Col1
Foo    [A, C, B, D, E]
Bar    [G, M, O]
Baz    [E, B]

I want to change it to:

Name   New_Col1   New_Col2   
Foo    A          C
Foo    C          B
Foo    B          D
Foo    D          E
Bar    G          M
Bar    M          O
Baz    E          B

How can I do that?

Upvotes: 1

Views: 115

Answers (3)

Beta
Beta

Reputation: 1746

Original data:

df = pd.DataFrame({'name': ['Foo', 'Bar', 'Baz'],
 'col1': [['A', 'C', 'B', 'D', 'E'], ['G', 'M', 'O'], ['E', 'B']]})

This step extract each values of col1 values

df = df.explode('col1')

Creating a new column by leading col1

df['New_Col2'] = df.groupby('name')['col1'].shift(-1)

Drop nam from last rows within each group and rename col1

df = df.dropna()
df.rename(columns={'col1': 'New_Col1'})

Final output

print(df)

This is similar to mozway's answer. But maybe bit more readable.

Upvotes: 0

mozway
mozway

Reputation: 261860

Using explode and GroupBy.shift:

s = df['Col1'].explode()
df.drop(columns='Col1').join(
 pd.concat([s, s.groupby(level=0).shift(-1)], axis=1)
   .set_axis(['New_Col1', 'New_Col2'], axis=1)
   .dropna()
)

output:

  Name New_Col1 New_Col2
0  Foo        A        C
0  Foo        C        B
0  Foo        B        D
0  Foo        D        E
1  Bar        G        M
1  Bar        M        O
2  Baz        E        B

With the approach, you can easily generalize to more combinations, example with 3:

s = df['Col1'].explode()

n = 3

df.drop(columns='Col1').join(
 pd.concat([s.groupby(level=0).shift(-x) for x in range(n)], axis=1)
   .set_axis([f'New_Col{i+1}' for i in range(n)], axis=1)
   .dropna()
)

output:

  Name New_Col1 New_Col2 New_Col3
0  Foo        A        C        B
0  Foo        C        B        D
0  Foo        B        D        E
1  Bar        G        M        O
2  Baz      NaN      NaN      NaN

Upvotes: 0

rafaelc
rafaelc

Reputation: 59274

Do two explode operations and then concat the results:

s = df.set_index('Name').Col1.str
pd.concat([s[:-1].explode(), 
           s[1:].explode()],
           axis=1)\
   .reset_index()

  Name Col1 Col1
0  Foo    A    C
1  Foo    C    B
2  Foo    B    D
3  Foo    D    E
4  Bar    G    M
5  Bar    M    O
6  Baz    E    B

Upvotes: 1

Related Questions