Reputation: 29
I have a Pandas DataFrame which contains a column name RecentDelays in which it contains a list of element. My DataFrame
Need to break this RecentDelays columns into N different column such as Delay1,Delay2,....with first value of list in Delay1 column of corresponding row,second value in Delay2 column of corresponding row and so on .If there in no nth value it should be NaN
Upvotes: 1
Views: 2015
Reputation: 863176
For new columns is better use DataFrame
contructor, because .apply(pd.Series)
is slow, check this timings, last join
to original:
#jedwards data sample
d1 = pd.DataFrame({'Airline':['A','B','C'],'Delays':[[],[1],[1,2]]})
d2 = (pd.DataFrame(d1['Delays'].values.tolist(), index=d1.index)
.rename(columns = lambda x: 'Delay{}'.format(x+1)))
df = d1.join(d2)
print (df)
Airline Delays Delay1 Delay2
0 A [] NaN NaN
1 B [1] 1.0 NaN
2 C [1, 2] 1.0 2.0
If need remove column use pop
first:
d2 = (pd.DataFrame(d1.pop('Delays').values.tolist(), index=d1.index)
.rename(columns = lambda x: 'Delay{}'.format(x+1)))
df = d1.join(d2)
print (df)
Airline Delay1 Delay2
0 A NaN NaN
1 B 1.0 NaN
2 C 1.0 2.0
Upvotes: 1
Reputation: 30230
Here's one method:
import pandas as pd
d1 = pd.DataFrame({'Airline':['A','B','C'],'Delays':[[],[1],[1,2]]})
# Expand column into temporary Dataframe
d2 = d1['Delays'].apply(pd.Series)
# Integrate temp columns back into original Dataframe (while naming column)
for col in d2:
d1["Delay%d" % (col+1)] = d2[col]
print(d1)
Before:
Airline Delays 0 A [] 1 B [1] 2 C [1, 2]
After:
Airline Delays Delay1 Delay2 0 A [] NaN NaN 1 B [1] 1.0 NaN 2 C [1, 2] 1.0 2.0
You could also name the columns in the temp dataframe with:
# Name columns of new dataframe
d2.columns = ["Delay%d" % (i+1) for i in range(len(d2.columns))]
And then use concat.
You can also drop the now-expanded Delays column with something like:
d1.drop(columns=['Delays'], inplace=True) # or,
d1.drop(['Delays'], axis=1, inplace=True)
Upvotes: 0