kxp
kxp

Reputation: 21

How to drop rows in multiple DataFrames?

I've got multiple DataFrames, each with a column named 'Year' and each containing rows from 1979 through 2014. I'd like to be able to loop over my list of DataFrames and apply the same selection criteria to each DataFrame and only keep a subset of the rows.

My example DataFrames:

df1 = pd.DataFrame({"Year": np.arange(1979,2015)})
df2 = pd.DataFrame({"Year": np.arange(1979,2015)})

My loop:

for df in [df1, df2]:
    df = df[(df['Year'] <= 2013)]

That code doesn't drop the last rows of the DataFrames, though. df1.tail() has all the rows that the original DataFrames had.

This works, though:

foo1 = df1[(df1['Year'] <= 2013)]
foo2 = df2[(df2['Year'] <= 2013)]

I have too many DataFrames to loop over to want to do it on a DataFrame-by-DataFrame basis, and would really like to have it work within a loop.

Any help would be much appreciated! Thanks.

Upvotes: 2

Views: 622

Answers (3)

user2285236
user2285236

Reputation:

When you assign a different object to a name, the previous object the name referred to does not change.

For example, let

a = [1, 2]
b = a

Now if I go ahead and point b to another object, a will stay the same:

b = [4, 5]
a
Out: [1, 2]

But instead of pointing b to another object, I can modify the object it points to:

a = [1, 2]
b = a
b.append(3)  
a
Out: [1, 2, 3]

In your for loop, these happen:

  • The name df points to df1 (Start of the loop)
  • The name df points to another object (which is df1[(df1['Year'] <= 2013)])
  • The name df points to df2 (second iteration)
  • The name df points to another object (which is df2[(df2['Year'] <= 2013)])

So you are not actually changing df1 or df2, you are just giving another target to df. If you print df at the end of the loop you'll see that it will print df2[(df2['Year'] <= 2013)].

What you can do is to modify/mutate the object:

for df in [df1, df2]:
    df.drop((df[(df['Year'] > 2013)]).index, inplace=True)
    # df = df.drop((df[(df['Year'] > 2013)]).index) wouldn't work

Here, we are not pointing df to another object; instead, we are changing the object at the target. If you print out df1 or df2 you'll see that they have changed.

So your options are either to change the DataFrames in place (if the methods allow you to do so), or store the DataFrames in a collection and change the objects in the collection like jezrael did.

Upvotes: 1

jezrael
jezrael

Reputation: 863206

You need to assign output to the list, because is not possible to modify original DataFrames:

dfs = []
for df in [df1, df2]:
    dfs.append(df[(df['Year'] <= 2013)])

Or use list comrehension:

dfs = [df[(df['Year'] <= 2013)] for df in [df1, df2]]

If want dictionary of DataFrames is possible use zip:

names = ['a','b'] 
dfs = dict(zip(names, [df[(df['Year'] <= 2013)] for df in [df1, df2]]))
print (dfs['a'])

Upvotes: 1

Greg
Greg

Reputation: 602

df.drop([rows you want to drop], axis = 0)

Upvotes: 0

Related Questions