Reputation: 327
So I have a list of dataframes df_list=[df1,df2,df3]
and a list of column headers I am interested in. col_list=['Fire','Water','Wind','Hail']
I want to loop through each dataframe df_list, and create a new dataframe with only the columns in col_list.The issue is if one of the elements in col_list is not in the df I still want it to make the dataframe however simply without that column.
What I tried doing is,
for data_frame in df_list:
try:
data_frame=data_frame[['Fire','Water','Wind','Hail']]
except:
continue
However, this does not give the result I am looking for.
Upvotes: 2
Views: 58
Reputation: 375475
You should use a list comprehension:
[data_frame[['Fire','Water','Wind','Hail']] for data_frame in df_list]
If some data_frames do not have all the columns you can use reindex instead:
[data_frame.reindex(columns=['Fire','Water','Wind','Hail']) for data_frame in df_list]
Inside the for loop:
data_frame=data_frame[['Fire','Water','Wind','Hail']]
is overwriting the data_frame variable BUT not updating the i-th item of df_list.
This is equivalent to the following code:
In [11]: a = [1, 2, 3]
In [12]: for i in a:
...: i = i + 1
...:
In [13]: a
Out[13]: [1, 2, 3]
Upvotes: 1
Reputation: 2137
You could use list comprehensions to get the subset of cols that are in col_list
. However, when you're iterating, the data_frame
var only has a reference to the object, changing it won't actually change the element in the array. You could keep another list with the "sub dataframes".
sub_df_list = []
for data_frame in df_list:
sub_df_list.append(
data_frame[[col for col in data_frame.columns if col in col_list]]
)
Edit:
As pointed out in another answer, you could do this as a single list comprehension... which is a bit hard on the eyes:
sub_df_list = [
data_frame[[col for col in data_frame.columns if col in col_list]]
for data_frame in df_list
]
Edit 2:
Pandas columns are an Index
object. These have set operations, such as intersection
. The easiest way to do what you're after is:
sub_df_list = [
data_frame[data_frame.columns.intersection(col_list)] for data_frame in df_list
]
Upvotes: 1