Does adding column to a DataFrame involve copying data?

Question

My question is about performance only, not semantics.

Does adding a new column to a df cause the data in the existing DataFrame to be physically copied to a new memory location (to ensure that the DataFrame occupies contiguous memory, for example)?

# using pandas 0.18.1, python 3.5
import pandas as pd
df = pd.DataFrame({'a': range(100)})
b = pd.Series(range(100))
df['b'] = b # is this operation expensive?
# equivalently df.loc[:, 'b'] = b

I know (from experimentation, couldn't find it in the documentation) that df['b'] = b will semantically create a copy of b, which obviously requires copying of underlying data. But I have no idea if the data in the other columns can stay where it was, or need to be moved sometimes.

Edit:

I know that adding a large number of columns is expensive. I'm only asking about adding a single column.

I also know that adding a row requires copying of the data in some cases (or always? -- not sure) for an obvious reason that the items in a single column have to be in contiguous memory.

Does adding column to a DataFrame involve copying data?

Answers (1)

Related Questions