Reputation: 313
Two quick pandas questions for you.
I have a list of dataframes I would like to apply a filter to.
countries = [us, uk, france]
for df in countries:
df = df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')]
When I run this, the df's don't change afterwards. Why is that? If I loop through the dataframes to create a new column, as below, this works fine, and changes each df in the list.
for df in countries:
df["Continent"] = "Europe"
As a follow up question, I noticed something strange when I created a list of dataframes for different countries. I defined the list then applied transformations to each df in the list. After I transformed these different dfs, I called the list again. I was surprised to see that the list still pointed to the unchanged dataframes, and I had to redefine the list to update the results. Could anybody shed any light on why that is?
Upvotes: 12
Views: 20666
Reputation: 21
For why
for df in countries:
df["Continent"] = "Europe"
modifies countries, while
for df in countries:
df = df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')]
does not, see why should I make a copy of a data frame in pandas. df is a reference to the actual DataFrame in countries, and not the actual DataFrame itself, but modifications to a reference affect the original DataFrame as well. Declaring a new column is a modification. However, taking a subset is not a modification. It is just changing what the reference is referring to in the original DataFrame.
Upvotes: 1
Reputation: 29690
Taking a look at this answer, you can see that for df in countries:
is equivalent to something like
for idx in range(len(countries)):
df = countries[idx]
# do something with df
which obviously won't actually modify anything in your list. It is generally bad practice to modify a list while iterating over it in a loop like this.
A better approach would be a list comprehension, you can try something like
countries = [us, uk, france]
countries = [df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')]
for df in countries]
Notice that with a list comprehension like this, we aren't actually modifying the original list - instead we are creating a new list, and assigning it to the variable which held our original list.
Also, you might consider placing all of your data in a single DataFrame with an additional country column or something along those lines - Python-level loops are generally slower and a list of DataFrames is often much less convenient to work with than a single DataFrame, which can fully leverage the vectorized pandas methods.
Upvotes: 11