Reputation: 103
I have a lot of dataframes and I would like to apply the same filter to all of them without having to copy paste the filter condition every time.
This is my code so far:
df_list_2019 = [df_spain_2019,df_amsterdam_2019, df_venice_2019, df_sicily_2019]
for data in df_list_2019:
data = data[['host_since','host_response_time','host_response_rate',
'host_acceptance_rate','host_is_superhost','host_total_listings_count',
'host_has_profile_pic','host_identity_verified',
'neighbourhood','neighbourhood_cleansed','zipcode','latitude','longitude','property_type','room_type',
'accommodates','bathrooms','bedrooms','beds','amenities','price','weekly_price',
'monthly_price','cleaning_fee','guests_included','extra_people','minimum_nights','maximum_nights',
'minimum_nights_avg_ntm','has_availability','availability_30','availability_60','availability_90',
'availability_365','number_of_reviews','number_of_reviews_ltm','review_scores_rating',
'review_scores_checkin','review_scores_communication','review_scores_location', 'review_scores_value',
'instant_bookable','is_business_travel_ready','cancellation_policy','reviews_per_month'
]]
but it doesn't apply the filter to the data frame. How can I change the code to do that?
Thank you
Upvotes: 1
Views: 900
Reputation: 148890
As soon as you write var = new_value
, you do not change the original object but have the variable refering a new object.
If you want to change the dataframes from df_list_2019
, you have to use an inplace=True
method. Here, you could use drop
:
keep = set(['host_since','host_response_time','host_response_rate',
'host_acceptance_rate','host_is_superhost','host_total_listings_count',
'host_has_profile_pic','host_identity_verified',
'neighbourhood','neighbourhood_cleansed','zipcode','latitude','longitude','property_type','room_type',
'accommodates','bathrooms','bedrooms','beds','amenities','price','weekly_price',
'monthly_price','cleaning_fee','guests_included','extra_people','minimum_nights','maximum_nights',
'minimum_nights_avg_ntm','has_availability','availability_30','availability_60','availability_90',
'availability_365','number_of_reviews','number_of_reviews_ltm','review_scores_rating',
'review_scores_checkin','review_scores_communication','review_scores_location', 'review_scores_value',
'instant_bookable','is_business_travel_ready','cancellation_policy','reviews_per_month'
])
for data in df_list_2019:
data.drop(columns=[col for col in data.columns if col not in keep], inplace=True)
But beware, pandas experts recommend to prefere the df = df. ...
idiom to the df...(..., inplace=True)
because it allows chaining the operations. So you should ask yourself if @timgeb's answer cannot be used. Anyway this one should work for your requirements.
Upvotes: 0
Reputation: 78690
The filter (column selection) is actually applied to every DataFrame, you just throw the result away by overriding what the name data
points to.
You need to store the results somewhere, a list for example.
cols = ['host_since','host_response_time', ...]
filtered = [df[cols] for df in df_list_2019]
Upvotes: 1