Ohm
Ohm

Reputation: 2442

Pandas: split a row to two or more rows when applying a row-wise function

I have a dataframe in pandas that looks like this:

df = pd.DataFrame([[4, 9],[4,9],[[1,2],[3,4]]], columns=['A', 'B'])
df

    A   B
0   4   9
1   4   9
2   [1, 2]  [3, 4]

However I would like to transform it to a table like this:

    A   B
0   4   9
1   4   9
2   1   2
3   3   4

Is there a way to apply a row wise function (using df.apply(function,axis=1,...) or some other function in pandas) to do that?

Upvotes: 4

Views: 122

Answers (7)

Sun
Sun

Reputation: 64

There is one problem in the question, it is not certain the list items in the same row always has same length. If that presumption is satisfied, then following answer works for it:

df.apply(pd.Series.explode) 


    A   B
0   4   9
1   4   9
2   1   3
2   2   4

Upvotes: 0

Georgina Skibinski
Georgina Skibinski

Reputation: 13397

You can do:

#main piece - the rest is actually 'fixing' the multiindex piece to fit your purpose:
df=df.stack().explode().to_frame()

df["id"]=df.groupby(level=[0,1]).cumcount()

df.index=pd.MultiIndex.from_tuples(zip(df.index.get_level_values(0)+df['id'], df.index.get_level_values(1)))

df=df.drop(columns="id").unstack()

df.columns=map(lambda x: x[1], df.columns)

Outputs:

>>> df

   A  B
0  4  9
1  4  9
2  1  3
3  2  4

Upvotes: 1

FBruzzesi
FBruzzesi

Reputation: 6495

Another possible solution to all others proposed so far using DataFrame.melt, DataFrame.explode and DataFrame.pivot:

import pandas as pd

df = pd.DataFrame([[4, 9],[4,9],[[1,2],[3,4]]], columns=['A', 'B'])
# Create index column
df.reset_index(inplace=True)

tmp = df.melt(id_vars='index', var_name='columns').explode('value')

# Define indexes
idx = sum([list(range(len(tmp)//tmp['columns'].nunique())) for _ in range(tmp['columns'].nunique())], [])
tmp['index'] = idx

result_df = tmp.pivot(index='index', columns='columns', values='value')

result_df
columns  A  B
index        
0        4  9
1        4  9
2        1  3
3        2  4

Upvotes: 0

sub234
sub234

Reputation: 31

Using simple for and if loops:

 alist = df['A'].tolist()
 blist = df['B'].tolist()

 alist1=[]
 blist1=[]
 for k,r in zip(alist,blist):
   if isinstance(k,list):
     alist1.append(k[0])
     blist1.append(k[1])
   if isinstance(r,list):
     alist1.append(r[0])
     blist1.append(r[1])
   else:
     alist1.append(k)
     blist1.append(r)

df = pd.DataFrame({'A': alist1, 'b': blist1})

Upvotes: 0

Erfan
Erfan

Reputation: 42926

Using DataFrame.apply, Series.explode, DataFrame.mask and DataFrame.where:

types = df.applymap(type).eq(list)
arr = df.where(types).apply(pd.Series.explode).dropna().T.to_numpy()
df.mask(types).dropna().append(pd.DataFrame(arr, columns=df.columns), ignore_index=True)

   A  B
0  4  9
1  4  9
2  1  2
3  3  4

Upvotes: 0

jezrael
jezrael

Reputation: 863291

Use lis comprehension with flatten values by chain:

from  itertools import chain

out = list(chain.from_iterable(item if isinstance(item[0],list) 
             else [item] for item in df[['A','B']].values))
df1 = pd.DataFrame(out, columns=['A','B'])

Or loop alternative:

out = []
for x in df[['A','B']].values:
    if isinstance(x[0], list):
        for y in x:
            out.append(y)
    else:
        out.append(x)

df1 = pd.DataFrame(out, columns=['A','B'])
print (df1)
   A  B
0  4  9
1  4  9
2  1  2
3  3  4

Upvotes: 3

Henry Yik
Henry Yik

Reputation: 22503

Using list comprehension with concat:

df = pd.DataFrame([[4, 9],[4,9],[[1,2],[3,4]],], columns=['A', 'B'])

print (pd.concat([df.loc[:1], *[pd.DataFrame(list(i),columns=df.columns) for i in df.loc[2:].to_numpy()]],
                 ignore_index=True))
   A  B
0  4  9
1  4  9
2  1  2
3  3  4

Upvotes: 1

Related Questions