Alissa L.
Alissa L.

Reputation: 103

How to filter multiple dataframes in a loop?

I have a lot of dataframes and I would like to apply the same filter to all of them without having to copy paste the filter condition every time.

This is my code so far:

df_list_2019 = [df_spain_2019,df_amsterdam_2019, df_venice_2019, df_sicily_2019]

for data in df_list_2019:
    data = data[['host_since','host_response_time','host_response_rate',
             'host_acceptance_rate','host_is_superhost','host_total_listings_count',
              'host_has_profile_pic','host_identity_verified',
             'neighbourhood','neighbourhood_cleansed','zipcode','latitude','longitude','property_type','room_type',
             'accommodates','bathrooms','bedrooms','beds','amenities','price','weekly_price',
             'monthly_price','cleaning_fee','guests_included','extra_people','minimum_nights','maximum_nights',
             'minimum_nights_avg_ntm','has_availability','availability_30','availability_60','availability_90',
              'availability_365','number_of_reviews','number_of_reviews_ltm','review_scores_rating',
              'review_scores_checkin','review_scores_communication','review_scores_location', 'review_scores_value',
              'instant_bookable','is_business_travel_ready','cancellation_policy','reviews_per_month'
             ]]

but it doesn't apply the filter to the data frame. How can I change the code to do that?

Thank you

Upvotes: 1

Views: 900

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148890

As soon as you write var = new_value, you do not change the original object but have the variable refering a new object.

If you want to change the dataframes from df_list_2019, you have to use an inplace=True method. Here, you could use drop:

keep = set(['host_since','host_response_time','host_response_rate',
             'host_acceptance_rate','host_is_superhost','host_total_listings_count',
              'host_has_profile_pic','host_identity_verified',
             'neighbourhood','neighbourhood_cleansed','zipcode','latitude','longitude','property_type','room_type',
             'accommodates','bathrooms','bedrooms','beds','amenities','price','weekly_price',
             'monthly_price','cleaning_fee','guests_included','extra_people','minimum_nights','maximum_nights',
             'minimum_nights_avg_ntm','has_availability','availability_30','availability_60','availability_90',
              'availability_365','number_of_reviews','number_of_reviews_ltm','review_scores_rating',
              'review_scores_checkin','review_scores_communication','review_scores_location', 'review_scores_value',
              'instant_bookable','is_business_travel_ready','cancellation_policy','reviews_per_month'
             ])

for data in df_list_2019:
    data.drop(columns=[col for col in data.columns if col not in keep], inplace=True)

But beware, pandas experts recommend to prefere the df = df. ... idiom to the df...(..., inplace=True) because it allows chaining the operations. So you should ask yourself if @timgeb's answer cannot be used. Anyway this one should work for your requirements.

Upvotes: 0

timgeb
timgeb

Reputation: 78690

The filter (column selection) is actually applied to every DataFrame, you just throw the result away by overriding what the name data points to.

You need to store the results somewhere, a list for example.

cols = ['host_since','host_response_time', ...]
filtered = [df[cols] for df in df_list_2019]

Upvotes: 1

Related Questions