Rishindra
Rishindra

Reputation: 29

Pandas distribute values of list element of a column into n different columns

I have a Pandas DataFrame which contains a column name RecentDelays in which it contains a list of element. My DataFrame

Need to break this RecentDelays columns into N different column such as Delay1,Delay2,....with first value of list in Delay1 column of corresponding row,second value in Delay2 column of corresponding row and so on .If there in no nth value it should be NaN

Upvotes: 1

Views: 2015

Answers (2)

jezrael
jezrael

Reputation: 863176

For new columns is better use DataFrame contructor, because .apply(pd.Series) is slow, check this timings, last join to original:

#jedwards data sample
d1 = pd.DataFrame({'Airline':['A','B','C'],'Delays':[[],[1],[1,2]]})

d2 = (pd.DataFrame(d1['Delays'].values.tolist(), index=d1.index)
        .rename(columns = lambda x: 'Delay{}'.format(x+1)))

df = d1.join(d2)
print (df)
  Airline  Delays  Delay1  Delay2
0       A      []     NaN     NaN
1       B     [1]     1.0     NaN
2       C  [1, 2]     1.0     2.0

If need remove column use pop first:

d2 = (pd.DataFrame(d1.pop('Delays').values.tolist(), index=d1.index)
        .rename(columns = lambda x: 'Delay{}'.format(x+1)))

df = d1.join(d2)
print (df)
  Airline  Delay1  Delay2
0       A     NaN     NaN
1       B     1.0     NaN
2       C     1.0     2.0

Upvotes: 1

jedwards
jedwards

Reputation: 30230

Here's one method:

import pandas as pd

d1 = pd.DataFrame({'Airline':['A','B','C'],'Delays':[[],[1],[1,2]]})

# Expand column into temporary Dataframe
d2 = d1['Delays'].apply(pd.Series)

# Integrate temp columns back into original Dataframe (while naming column)
for col in d2:
    d1["Delay%d" % (col+1)] = d2[col]

print(d1)

Before:

  Airline  Delays
0       A      []
1       B     [1]
2       C  [1, 2]

After:

  Airline  Delays  Delay1  Delay2
0       A      []     NaN     NaN
1       B     [1]     1.0     NaN
2       C  [1, 2]     1.0     2.0

You could also name the columns in the temp dataframe with:

# Name columns of new dataframe
d2.columns = ["Delay%d" % (i+1) for i in range(len(d2.columns))]

And then use concat.

You can also drop the now-expanded Delays column with something like:

d1.drop(columns=['Delays'], inplace=True) # or,
d1.drop(['Delays'], axis=1, inplace=True)

Upvotes: 0

Related Questions