Reputation: 63
I have used the following code for splitting a pandas dataframe column into multiple columns:
df = pd.concat([df.X.apply(pd.Series).rename(columns="X{}".format), df.Y], axis=1)
Its showing a memory error:
stacked = np.empty(shape, dtype=dtype) MemoryError
Upvotes: 0
Views: 377
Reputation: 403130
apply(pd.Series)
can be slow and expensive, so I'd recommend something more efficient using tolist
and a DataFrame
constructor call. You can also perform the rename on the just the columns, so you don't unnecessarily create a new copy of the dataframe.
y = df['Y']
df = pd.DataFrame(df.X.tolist(), index=df.index)
df.columns = list(map("X{}".format, df.columns))
df['Y'] = y
In place assignment as opposed to pd.concat
which returns yet another copy should be even faster.
Upvotes: 1