Reputation:

How to merge all csv files into one file and have the data stacked under the original headers?

Im new to python and trying to understand data manipulation.

I have a folder with several files. Some of which are csv's. I want to merge all of the csvs - approximately 400 of them into one single csv and all the data to be stacked

for example if the first csv has a dataframe-

transcript  confidence  from    to  speaker Negative Neutral    Positive    compound
thank you   0.85    1.39    1.65    0   0   0.754              0.246         0.7351

second has a dataframe:

 transcript confidence  from    to  speaker Negative Neutral    Positive compound
    welcome     0.95       1.39   1.65  0   0       0.754        0.201   0.8351

I want the final df to look like -

transcript  confidence from to  speaker Negative Neutral      Positive       compound
thank you   0.85      1.39  1.65    0   0       0.754              0.246         0.7351
welcome     0.95      1.39  1.65    0   0       0.754              0.201         0.8351

I tried-

import glob
import pandas as pd

# Folder containing the .csv files to merge
file_path = "C:\\Users\\Desktop"

# This pattern \\* selects all files in a directory
pattern = file_path + "\\*"
files = glob.glob(pattern)

# Import first file to initiate the dataframe
df = pd.read_csv(files[0],encoding = "utf-8", delimiter = ",")

# Append all the files as dataframes to the first one
for file in files[1:len(file_list)]:
    df_csv = pd.read_csv(file,encoding = "utf-8", delimiter = ",")
    df = df.append(df_csv)

But it did not work. How can I solve this issue?

Upvotes: 0

Answers (3)

NiKiuS

Reputation: 1

I used @JayPatel solutions and upgraded to an automated.py file. It works in case you add more CSV files and you had an old previous merge. Nothing fancy, but does the work.

    import os
    import glob
    import pandas as pd
    
    stop = 0
    while stop != 1:
        if glob.glob('merged.csv') == []:
            all_files = glob.glob(os.path.join('*.csv'))
            df_from_each_file = (pd.read_csv(csvfiles) for csvfiles in all_files) 
            df_merged   = pd.concat(df_from_each_file, ignore_index=True)
            df_merged.to_csv("merged.csv")
            print('The merged.csv file was created successfully')
            stop = 1
        else:
            print('You need to delete a previous merged.csv file first')
            delete = input('Do you want to delete it? (Y/N): ')
            if delete == 'Y':
                os.remove('merged.csv')
            else:
                stop = 1

Upvotes: 0

Jay Patel

Reputation: 543

NOTE:- I will Suggest you instead of Fetching all the CSV Files from the Desktop. Kindly Save it to One Directory it will be also helpful if you want to analyze that particular dataset in the future.

Basic Requirement before solution:- All the CSV Files you want to Merge should be in the Same Directory.

# Import all Important Libraries

# 'os' module will provide a portable way of using an operating system with dependent functionality such as 'Open File', and much more...
import os

# 'glob' module helps to find all the pathnames matched with a specified pattern according to the rules. Such as '*.csv' which is used in our case for finding all CSV Files
import glob

# 'pandas' is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool
import pandas as pd

# First of all declare 'path' variable for finding all the CSV  
path = "C:/Users/Desktop"

# Store all files in 'all_files' using 'glob' function. and a pattern used is '*.csv' Which will find all the CSV and 'join' it
all_files = glob.glob(os.path.join(path, "*.csv"))

# Initialize 'DataFrame' Variable from each fetched CSV file
df_from_each_file = (pd.read_csv(csvfiles) for csvfiles in all_files) 
# if you have 'Seperator' then use 'pd.read_csv(csvfiles, sep='seprator pattern ('\', ',', etc.)')' in above code

# Concat all the 'DataFrame' using 'pd.concat()'
df_merged   = pd.concat(df_from_each_file, ignore_index=True)

# Store Merged CSV Files into 'merged.csv' File
df_merged.to_csv("merged.csv")

Upvotes: 1

Corralien

Reputation: 120469

This should help:

import pandas as pd
import glob
import os.path

file_path = "C:/Users/Desktop"

data = []
for csvfile in glob.glob(os.path.join(file_path, "*.csv")):
    df = pd.read_csv(csvfile, encoding="utf-8", delimiter=",")
    data.append(df)

data = pd.concat(data, ignore_index=True)

Upvotes: 1

How to merge all csv files into one file and have the data stacked under the original headers?

Answers (3)

Related Questions