Ben Price
Ben Price

Reputation: 677

append multiple pandas data frames to single csv, but only include header on first append

I need to create a .csv file and append subsets of multiple dataframes into it.

All the dataframes are structured identically, however I need to create the output data set with headers, and then append all the subsequent data frames without headers.

I know I could just create the output file using the headers from the first data frame and then do an append loop with no headers from there, but I'd really like to learn how to do this in a more efficient way.

path ='/Desktop/NYC TAXI/Green/*.csv' 
allFiles = glob.glob(path)

for file in allFiles:
    df = pd.read_csv(file, skiprows=[1,2], usecols=np.arange(20))
    metsdf = df.loc[df['Stadium_Code'] == 2]
    yankdf = df.loc[df['Stadium_Code'] == 1]
    with open('greenyankeetaxi.csv','a') as yankeetaxi:
        yankdf.to_csv(yankeetaxi,header=false)
    with open('greenmetstaxi.csv','a') as metstaxi:
        metsdf.to_csv(metstaxi,header=false)
    print(file + " done")

Upvotes: 2

Views: 12056

Answers (2)

Ikram Ul Haq
Ikram Ul Haq

Reputation: 96

The efficient way to append multiple subsets of a dataframe in a large file with only one header is following:

        for df in dataframes:

            if not os.path.isfile(filename):
                df.to_csv(filename, header='column_names', index=False)
            else:  # else it exists so append without writing the header
                df.to_csv(filename, mode='a', header=False, index=False)

In the above code, I have written a file for the first time with a header and after that, I checked the existence of the file and just appended it without the header in the file.

you can use the above code in any scenario where you need to append multiple dataframes in the same file without the header multiple times.

Upvotes: 3

Leb
Leb

Reputation: 15953

To do it efficiently, you can use one of the Merge, join, and concatenate so you have two complete dataframe (yankdf and metsdf), then write to csv using to_csv as you have been doing.


Current data

Here we have 2 dataframe, one from each file:

First dataframe df

   a  b  c
0  1  2  3
1  4  5  6

Second dataframe df2

   a   b   c
0  7   6   8
1  9  10  11

Using append

df = df.append(df2) 

The above line will result in a single df which can be written to file

   a   b   c
0  1   2   3
1  4   5   6
0  7   6   8
1  9  10  11

In short:

  • Loop through files in directory
  • Add data to dataframe using append instead of re-assigning everytime
  • Write a single dataframe to file

Upvotes: 2

Related Questions