Reputation: 168
I have all these dataframes:
demographic_data1 = pd.read_csv('Demographic Data Part 1',index_col=0,dtype={'Year':object})
demographic_data2 = pd.read_csv('Demographic Data Part 2',index_col=0,dtype={'Year':object})
employment_data1 = pd.read_csv('Employment Data Part 1',index_col=0,dtype={'Year':object})
employment_data2 = pd.read_csv('Employment Data Part 2',index_col=0,dtype={'Year':object})
employment_data3 = pd.read_csv('Employment Data Part 3',index_col=0,dtype={'Year':object})
employment_data4 = pd.read_csv('Employment Data Part 4',index_col=0,dtype={'Year':object})
employment_data5 = pd.read_csv('Employment Data Part 5',index_col=0,dtype={'Year':object})
employment_data6 = pd.read_csv('Employment Data Part 6',index_col=0,dtype={'Year':object})
employment_data7 = pd.read_csv('Employment Data Part 7',index_col=0,dtype={'Year':object})
employment_data8 = pd.read_csv('Employment Data Part 8',index_col=0,dtype={'Year':object})
employment_data9 = pd.read_csv('Employment Data Part 9',index_col=0,dtype={'Year':object})
employment_data10 = pd.read_csv('Employment Data Part 10',index_col=0,dtype={'Year':object})
employment_data11 = pd.read_csv('Employment Data Part 11',index_col=0,dtype={'Year':object})
employment_data12 = pd.read_csv('Employment Data Part 12',index_col=0,dtype={'Year':object})
employment_data13 = pd.read_csv('Employment Data Part 13',index_col=0,dtype={'Year':object})
health_insurance_data = pd.read_csv('Health Insurance Data Part 1',index_col=0,dtype={'Year':object})
orig_data_updated = pd.read_csv('ML Original Data Updated 2018',index_col=0,dtype={'Year':object})
I have to do this if I wanted to join two of them:
new_df1 = orig_data_updated.merge(demographic_data1.drop_duplicates(subset=['Location+Type']), how='left')
Then to keep joining more I do this:
new_df2 = new_df1.merge(demographic_data2.drop_duplicates(subset=['Location+Type']), how='left')
How do I do this in just one go?
Upvotes: 0
Views: 61
Reputation: 469
Updated Answer:
Pandas has a method for combining a list (among other sequences/maps) of dataframes: pd.concat
(https://pandas.pydata.org/docs/reference/api/pandas.concat.html)
combined_df = pd.concat(df_list)
------------------- Previous Answer Below -------------------
You could append each dataframe to a list and loop over the list merging a new dataframe each time.
file_list = [
"Demographic Data Part 1",
"Demographic Data Part 2",
...
]
df_list = []
for file_name in file_list:
df = pd.read_csv(file_name,index_col=0,dtype={'Year':object})
df_list.append(df)
combined_df = pd.DataFrame()
for df in df_list:
combined_df = combined_df.merge(df.drop_duplicates(subset=['Location+Type']), how='left')
Upvotes: 2