Reputation:
I'd like to do the following steps: 1. merge all csv in the same directory 2. create as Dataframe 3. assign columns and drop a column then set one of column ('Type') as index 4. for all file,s I would like to melt column D to end column as rows
file_list = glob.glob("*.csv")
for file in file_list:
merged_file = pd.read_csv(file)
print(merged_file)
merged_file = pd.DataFrame()
df.columns = ['Type', 'Country', 'Source']
df = df.drop('Source', axis=1)
df = df.set_index('Type',drop=False).stack().reset_index()
agg_df = pd.melt(df, id_vars = [df, 'Source', 'Country', 'Type'])
df = df.sort_values('Type').reset_index(drop=True)
print(df)
Mineral name - Type - Country - Prod_t_2021
Mineral name - Type - Country - Prod_t_2022
Mineral name - Type - Country - Reserves_t
Mineral name - Type - Country - Reserves_notes
Mineral name could be extracted from Type as string
The source is World.zip from URL: https://www.sciencebase.gov/catalog/item/63b5f411d34e92aad3caa57f
Upvotes: 0
Views: 63
Reputation: 37737
IIUC, use this :
df = (
pd.concat([pd.read_csv(f) for f in file_list], ignore_index=True)
.melt(id_vars = ["Source", "Country", "Type"])
.set_index("Type")
.sort_index()
)
Output :
Source Country variable value
Type
Mine production: Palladium MCS2023 United States Prod_t_est_2022 NaN
Mine production: Palladium MCS2023 South Africa Reserves_ore_kt NaN
Mine production: Palladium MCS2023 Russia Reserves_ore_kt NaN
Mine production: Palladium MCS2023 Canada Reserves_ore_kt NaN
Mine production: Palladium MCS2023 United States Reserves_ore_kt NaN
Mine production: Palladium MCS2023 United States Reserves_kt NaN
Mine production: Palladium MCS2023 Canada Reserves_kt NaN
Mine production: Palladium MCS2023 Russia Reserves_kt NaN
Mine production: Palladium MCS2023 South Africa Reserves_kt NaN
Mine production: Palladium MCS2023 Other countries Reserves_notes Included with platinum
Upvotes: 0