wgf4242
wgf4242

Reputation: 831

pandas split and expand two columns

Split tag and author then expand to new rows.

df = pd.DataFrame([
        {'name': 'book1', 'tag': 'a b c', 'author': 'a1 a2'},
    ],columns=['name', 'tag', 'author']);
print(df)

    name    tag author
0  book1  a b c  a1 a2

Expected:

[out]
    name tag author
0  book1   a     a1
1  book1   b     a2
2  book1   c    NaN

Upvotes: 1

Views: 566

Answers (2)

piRSquared
piRSquared

Reputation: 294508

For those with sufficiently updated Python to use the splat unpacking

from itertools import zip_longest
import pandas as pd

pd.DataFrame(
    [n + m for *n, t, a in zip(*map(df.get, df))
           for *m,      in zip_longest(*map(str.split, (t, a)))],
    columns=[*df]
)

    name tag author
0  book1   a     a1
1  book1   b     a2
2  book1   c   None

Upvotes: 0

jezrael
jezrael

Reputation: 863291

Use DataFrame.set_index by all repeating values of columns, then reshape by DataFrame.stack, then Series.str.split with expand=True for DataFrame and last reshape by stack with unstack:

df1 = (df.set_index('name')
         .stack()
         .str.split(expand=True)
         .stack()
         .unstack(1)
         .reset_index(level=0)
         .reset_index(drop=True))
print (df1)
    name tag author
0  book1   a     a1
1  book1   b     a2
2  book1   c    NaN

Another solution:

df1 = (df.set_index('name')
         .apply(lambda x: x.str.split(expand=True).stack())
         .reset_index(level=0)
         .reset_index(drop=True)
        )

Upvotes: 2

Related Questions