Python memory usage when a variable is used for different tasks

Question

I am working in an ETL pipeline with pandas and I am exceding the memory usage of my computer.

I am reading of memory usage in Python and I don´t understand how memory usage works when I create a pandas Dataframe and I assign a name for this Dataframe and I use the same name to do some transformation or adding more columns to it.

For example:

df = pd.DataFrame(
{
'column1': [1,2]
,'column1': ['a','b']})

If now I want to add another column to this Dataframe:

df['column3'] = 1

The memory that is being used for this first df Dataframe is replaced for this new df Dataframe or now python is using memory for both Dataframes?

What happens if then I want to remove one of the columns?:

df = df.drop(columns = {'column1'})

Python memory usage when a variable is used for different tasks

Answers (1)

Related Questions