Reputation: 1
I have multiple DataFrame
and I need to perform various operations on them. I want to put them in one list
to avoid listing them all the time as in the example bellow:
for df in (df1, df2,df3,df4,df5,df6,df7):
df.columns=['COUNTRY','2018','2019']
df.replace({':':''}, regex=True, inplace=True)
df.replace({' ':''}, regex=True, inplace=True)
df["2018"] = pd.to_numeric(df["2018"], downcast="float")
df["2019"] = pd.to_numeric(df["2019"], downcast="float")
I tried to make a list of them (DataFrames=[df1,df2,df3,df4,df5,df6,df7]
) but it's working neither in the loop or as an argument of a function.
for df in (DataFrame):
df.columns=['COUNTRY','2018','2019']
df.replace({':':''}, regex=True, inplace=True)
df.replace({' ':''}, regex=True, inplace=True)
df["2018"] = pd.to_numeric(df["2018"], downcast="float")
df["2019"] = pd.to_numeric(df["2019"], downcast="float")
Upvotes: 0
Views: 699
Reputation: 449
Using nunvie's answer as a base, here is another option for you:
import pandas as pd
data = {
'COUNTRY': ['country1', 'country2', 'country3'],
'2018': ['12.0', '27', '35'],
'2019': ['2:3', '3:9.6', '4:0.3'],
'2020': ['35', '42', '56']
}
df_list = [pd.DataFrame(data) for i in range(5)]
def data_prep(df: pd.DataFrame):
df = df.loc[:, ['COUNTRY', '2018', '2019']]
df.replace({':': ''}, regex=True, inplace=True)
df.replace({' ': ''}, regex=True, inplace=True)
df['2018'] = pd.to_numeric(df['2018'], downcast="float")
df['2019'] = pd.to_numeric(df['2019'], downcast="float")
return df
new_df_list = map(data_prep, df_list)
The improvements (in my opinion) are as follows. First, it is more concise to use list comprehension for the test setup (that's not directly related to the answer). Second, pd.to_numeric
doesn't have inplace
(at least in pandas 1.2.3). It returns the series you passed if the parsing succeeded. Thus, you need to explicitly say df['my_col'] = pd.to_numeric(df['my_col'])
.
And third, I've used map
to apply the data_prep
function to each DataFrame
in the list. This makes data_prep
responsible for only one data frame and also saves you from writing loops. The benefit is leaner and more readable code, if you like the functional flavour of it, of course.
Upvotes: 0
Reputation: 31
you can place the dataframes on a list and add the columns like this:
import pandas as pd
from pandas import DataFrame
data = {'COUNTRY': ['country1', 'country2', 'country3'],
'2018': [12.0, 27, 35],
'2019': [23, 39.6, 40.3],
'2020': [35, 42, 56]}
df_list = [DataFrame(data), DataFrame(data), DataFrame(data),
DataFrame(data), DataFrame(data), DataFrame(data),
DataFrame(data)]
def change_dataframes(data_frames=None):
for df in data_frames:
df = df.loc[:, ['COUNTRY', '2018', '2019']]
df.replace({':': ''}, regex=True, inplace=True)
df.replace({' ': ''}, regex=True, inplace=True)
pd.to_numeric(df['2018'], downcast="float")
pd.to_numeric(df['2019'], downcast="float")
return data_frames
Upvotes: 1