kashf34Kashf
kashf34Kashf

Reputation: 63

Concatenating large pandas dataframes produces MemoryError

I have used the following code for splitting a pandas dataframe column into multiple columns:

df = pd.concat([df.X.apply(pd.Series).rename(columns="X{}".format), df.Y], axis=1)

Its showing a memory error:

stacked = np.empty(shape, dtype=dtype) MemoryError

Upvotes: 0

Views: 377

Answers (1)

cs95
cs95

Reputation: 403130

apply(pd.Series) can be slow and expensive, so I'd recommend something more efficient using tolist and a DataFrame constructor call. You can also perform the rename on the just the columns, so you don't unnecessarily create a new copy of the dataframe.

y = df['Y']
df = pd.DataFrame(df.X.tolist(), index=df.index)
df.columns = list(map("X{}".format, df.columns))
df['Y'] = y

In place assignment as opposed to pd.concat which returns yet another copy should be even faster.

Upvotes: 1

Related Questions