How to merge multiple csv and create a dataframe?

Question

I'd like to do the following steps: 1. merge all csv in the same directory 2. create as Dataframe 3. assign columns and drop a column then set one of column ('Type') as index 4. for all file,s I would like to melt column D to end column as rows

file_list = glob.glob("*.csv")

for file in file_list:
    merged_file = pd.read_csv(file)
    print(merged_file)

merged_file = pd.DataFrame()
df.columns = ['Type', 'Country', 'Source']
df = df.drop('Source', axis=1)
df = df.set_index('Type',drop=False).stack().reset_index()

agg_df = pd.melt(df, id_vars = [df, 'Source', 'Country', 'Type'])

df = df.sort_values('Type').reset_index(drop=True)
print(df)

The clear expected output (aligned to the left):

Mineral name - Type - Country - Prod_t_2021
Mineral name - Type - Country - Prod_t_2022
Mineral name - Type - Country - Reserves_t
Mineral name - Type - Country - Reserves_notes

Mineral name could be extracted from Type as string

The source is World.zip from URL: https://www.sciencebase.gov/catalog/item/63b5f411d34e92aad3caa57f

Timeless · Accepted Answer

IIUC, use this :

df = (
        pd.concat([pd.read_csv(f) for f in file_list], ignore_index=True)
            .melt(id_vars = ["Source", "Country", "Type"])
            .set_index("Type")
            .sort_index()
      )

Output :

                              Source          Country         variable                   value
Type                                                                                          
 Mine production: Palladium  MCS2023    United States  Prod_t_est_2022                     NaN
 Mine production: Palladium  MCS2023     South Africa  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023           Russia  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023           Canada  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023    United States  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023    United States      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023           Canada      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023           Russia      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023     South Africa      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023  Other countries   Reserves_notes  Included with platinum

How to merge multiple csv and create a dataframe?

The clear expected output (aligned to the left):

Answers (1)

Related Questions