Andrés Bustamante
Andrés Bustamante

Reputation: 462

Convert dataframe of list in columns to rows

I have a pandas DataFrame of this type

col1  col2           col3
 1   [blue]         [in,out]
 2   [green, green] [in]
 3   [green]        [in]

and I need convert it to a dataframe that keep the first column and distribute all the other values in columns as rows:

col1 value
1    blue
1    in
1    out
2    green
2    green
2    in
3    green
3    in

Upvotes: 2

Views: 99

Answers (3)

sub234
sub234

Reputation: 31

d = []
c = []
for i in range(len(df)):
   d.append([j for j in df['c2'][i]])
   d.append([j for j in df['c3'][i]])
   c.append(str(df['c1'][i]) * (len(df['c2'][i])+ len(df['c3'][i])))
   c = [list(j) for j in c]

d = [i for sublist in d for i in sublist]
c = [i for sublist in d for i in sublist]
df1 = pd.DataFrame()
df1['c1'] = c
df1['c2'] = d
df = df1

Upvotes: 0

Jaroslav Bezděk
Jaroslav Bezděk

Reputation: 7635

Another solution could consist of:

  • list comprehension to make col1 with new values and
  • using list concatenation of values in df['col2'] and df['col3'] in order to make value column.

The code is following:

df_final = pd.DataFrame(
    {
        'col1': [
            i for i, sublist in zip(df['col1'], (df['col2'] + df['col3']).values) 
              for val in range(len(sublist))
        ],
        'value': sum((df['col2'] + df['col3']).values, [])
    }
)
print(df_final)
   col1   value
0     1    blue
1     1      in
2     1     out
3     2   green
4     2   green
5     2      in
6     3   green
7     3      in

Upvotes: 0

jezrael
jezrael

Reputation: 863531

Use DataFrame.stack with Series.explode for convert lists, last some data cleaning with DataFrame.reset_index:

df1 = (df.set_index('col1')
         .stack()
         .explode()
         .reset_index(level=1, drop=True)
         .reset_index(name='value'))

Alternative with DataFrame.melt and DataFrame.explode:

df1 = (df.melt('col1')
         .explode('value')
         .sort_values('col1')[['col1','value']]
         .reset_index(drop=True)
)

print (df1)
   col1  value
0     1   blue
1     1     in
2     1    out
3     2  green
4     2  green
5     2     in
6     3  green
7     3     in

Or list comprehension solution:

L = [(k, x) for k, v in df.set_index('col1').to_dict('index').items() 
            for k1, v1 in v.items() 
            for x in v1]

df1 = pd.DataFrame(L, columns=['col1','value'])
print (df1)
   col1  value
0     1   blue
1     1     in
2     1    out
3     2  green
4     2  green
5     2     in
6     3  green
7     3     in

Upvotes: 1

Related Questions