Reputation:
Im new to python and trying to understand data manipulation.
I have a folder with several files. Some of which are csv's. I want to merge all of the csvs - approximately 400 of them into one single csv and all the data to be stacked
for example if the first csv has a dataframe-
transcript confidence from to speaker Negative Neutral Positive compound
thank you 0.85 1.39 1.65 0 0 0.754 0.246 0.7351
second has a dataframe:
transcript confidence from to speaker Negative Neutral Positive compound
welcome 0.95 1.39 1.65 0 0 0.754 0.201 0.8351
I want the final df to look like -
transcript confidence from to speaker Negative Neutral Positive compound
thank you 0.85 1.39 1.65 0 0 0.754 0.246 0.7351
welcome 0.95 1.39 1.65 0 0 0.754 0.201 0.8351
I tried-
import glob
import pandas as pd
# Folder containing the .csv files to merge
file_path = "C:\\Users\\Desktop"
# This pattern \\* selects all files in a directory
pattern = file_path + "\\*"
files = glob.glob(pattern)
# Import first file to initiate the dataframe
df = pd.read_csv(files[0],encoding = "utf-8", delimiter = ",")
# Append all the files as dataframes to the first one
for file in files[1:len(file_list)]:
df_csv = pd.read_csv(file,encoding = "utf-8", delimiter = ",")
df = df.append(df_csv)
But it did not work. How can I solve this issue?
Upvotes: 0
Views: 2649
Reputation: 1
I used @JayPatel solutions and upgraded to an automated.py file. It works in case you add more CSV files and you had an old previous merge. Nothing fancy, but does the work.
import os
import glob
import pandas as pd
stop = 0
while stop != 1:
if glob.glob('merged.csv') == []:
all_files = glob.glob(os.path.join('*.csv'))
df_from_each_file = (pd.read_csv(csvfiles) for csvfiles in all_files)
df_merged = pd.concat(df_from_each_file, ignore_index=True)
df_merged.to_csv("merged.csv")
print('The merged.csv file was created successfully')
stop = 1
else:
print('You need to delete a previous merged.csv file first')
delete = input('Do you want to delete it? (Y/N): ')
if delete == 'Y':
os.remove('merged.csv')
else:
stop = 1
Upvotes: 0
Reputation: 543
NOTE:- I will Suggest you instead of Fetching all the CSV Files from the Desktop. Kindly Save it to One Directory it will be also helpful if you want to analyze that particular dataset in the future.
Basic Requirement before solution:- All the CSV Files you want to Merge should be in the Same Directory.
# Import all Important Libraries
# 'os' module will provide a portable way of using an operating system with dependent functionality such as 'Open File', and much more...
import os
# 'glob' module helps to find all the pathnames matched with a specified pattern according to the rules. Such as '*.csv' which is used in our case for finding all CSV Files
import glob
# 'pandas' is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool
import pandas as pd
# First of all declare 'path' variable for finding all the CSV
path = "C:/Users/Desktop"
# Store all files in 'all_files' using 'glob' function. and a pattern used is '*.csv' Which will find all the CSV and 'join' it
all_files = glob.glob(os.path.join(path, "*.csv"))
# Initialize 'DataFrame' Variable from each fetched CSV file
df_from_each_file = (pd.read_csv(csvfiles) for csvfiles in all_files)
# if you have 'Seperator' then use 'pd.read_csv(csvfiles, sep='seprator pattern ('\', ',', etc.)')' in above code
# Concat all the 'DataFrame' using 'pd.concat()'
df_merged = pd.concat(df_from_each_file, ignore_index=True)
# Store Merged CSV Files into 'merged.csv' File
df_merged.to_csv("merged.csv")
Upvotes: 1
Reputation: 120469
This should help:
import pandas as pd
import glob
import os.path
file_path = "C:/Users/Desktop"
data = []
for csvfile in glob.glob(os.path.join(file_path, "*.csv")):
df = pd.read_csv(csvfile, encoding="utf-8", delimiter=",")
data.append(df)
data = pd.concat(data, ignore_index=True)
Upvotes: 1