Chris
Chris

Reputation: 2071

Python - function RAM memory efficiency

I have this function:

def preprocess(data_input):
    data = data_input.copy()
    
    data = # some code
    data = # some code
    data = # some code
    
    return(data)

I have a dataframe df that comes as an input to this function after a long quantity of jupyter notebook cells, like:

preprocess(df)

With the current function, I assign the changes of the local df (inside the function) to the global df (outside the function) with df = preprocess(df). I do this thing in this way in order to avoid running the whole notebook each time I find an error in the function. So when I know the function is running correctly (after infinite validations and bugs corrections), I just change preprocess(df) for df = preprocess(df).

This answer says that the best case is to use the .copy() method (The answer is a bit old, from 2015). However, I'm wondering about memory efficiency. If I have a 4GB data frame, only ONE copy would make me to use 8GB RAM (the original + the copy). So, is there any way of replace the .copy() method with a more efficient alternative?

In addition, You are welcome if you have a better suggestion for avoiding running the whole notebook when debugging a function, I would appreciate it!

EDIT

I forgot to mention if I don't use .copy() method, I always receive a SettingWithCopyWarning.

Upvotes: 0

Views: 74

Answers (1)

el científico
el científico

Reputation: 11

If I understand correctly, You can probably put that function inside another module so you can test it without having to run your whole code to read that function definition.

Having your function inside a module would allow you to make a small data set to design a test case that lets you test the most common use cases, you can even create a whole new script to test your function.

Upvotes: 1

Related Questions