Reputation: 21
I've got multiple DataFrames, each with a column named 'Year' and each containing rows from 1979 through 2014. I'd like to be able to loop over my list of DataFrames and apply the same selection criteria to each DataFrame and only keep a subset of the rows.
My example DataFrames:
df1 = pd.DataFrame({"Year": np.arange(1979,2015)})
df2 = pd.DataFrame({"Year": np.arange(1979,2015)})
My loop:
for df in [df1, df2]:
df = df[(df['Year'] <= 2013)]
That code doesn't drop the last rows of the DataFrames, though. df1.tail() has all the rows that the original DataFrames had.
This works, though:
foo1 = df1[(df1['Year'] <= 2013)]
foo2 = df2[(df2['Year'] <= 2013)]
I have too many DataFrames to loop over to want to do it on a DataFrame-by-DataFrame basis, and would really like to have it work within a loop.
Any help would be much appreciated! Thanks.
Upvotes: 2
Views: 622
Reputation:
When you assign a different object to a name, the previous object the name referred to does not change.
For example, let
a = [1, 2]
b = a
Now if I go ahead and point b
to another object, a
will stay the same:
b = [4, 5]
a
Out: [1, 2]
But instead of pointing b to another object, I can modify the object it points to:
a = [1, 2]
b = a
b.append(3)
a
Out: [1, 2, 3]
In your for loop, these happen:
df
points to df1
(Start of the loop)df
points to another object (which is df1[(df1['Year'] <= 2013)]
)df
points to df2
(second iteration)df
points to another object (which is df2[(df2['Year'] <= 2013)]
)So you are not actually changing df1
or df2
, you are just giving another target to df
. If you print df
at the end of the loop you'll see that it will print df2[(df2['Year'] <= 2013)]
.
What you can do is to modify/mutate the object:
for df in [df1, df2]:
df.drop((df[(df['Year'] > 2013)]).index, inplace=True)
# df = df.drop((df[(df['Year'] > 2013)]).index) wouldn't work
Here, we are not pointing df
to another object; instead, we are changing the object at the target. If you print out df1
or df2
you'll see that they have changed.
So your options are either to change the DataFrames in place (if the methods allow you to do so), or store the DataFrames in a collection and change the objects in the collection like jezrael did.
Upvotes: 1
Reputation: 863206
You need to assign output to the list
, because is not possible to modify original DataFrame
s:
dfs = []
for df in [df1, df2]:
dfs.append(df[(df['Year'] <= 2013)])
Or use list comrehension
:
dfs = [df[(df['Year'] <= 2013)] for df in [df1, df2]]
If want dictionary of DataFrames
is possible use zip
:
names = ['a','b']
dfs = dict(zip(names, [df[(df['Year'] <= 2013)] for df in [df1, df2]]))
print (dfs['a'])
Upvotes: 1