Reputation: 677
I need to create a .csv file and append subsets of multiple dataframes into it.
All the dataframes are structured identically, however I need to create the output data set with headers, and then append all the subsequent data frames without headers.
I know I could just create the output file using the headers from the first data frame and then do an append loop with no headers from there, but I'd really like to learn how to do this in a more efficient way.
path ='/Desktop/NYC TAXI/Green/*.csv'
allFiles = glob.glob(path)
for file in allFiles:
df = pd.read_csv(file, skiprows=[1,2], usecols=np.arange(20))
metsdf = df.loc[df['Stadium_Code'] == 2]
yankdf = df.loc[df['Stadium_Code'] == 1]
with open('greenyankeetaxi.csv','a') as yankeetaxi:
yankdf.to_csv(yankeetaxi,header=false)
with open('greenmetstaxi.csv','a') as metstaxi:
metsdf.to_csv(metstaxi,header=false)
print(file + " done")
Upvotes: 2
Views: 12056
Reputation: 96
The efficient way to append multiple subsets of a dataframe in a large file with only one header is following:
for df in dataframes:
if not os.path.isfile(filename):
df.to_csv(filename, header='column_names', index=False)
else: # else it exists so append without writing the header
df.to_csv(filename, mode='a', header=False, index=False)
In the above code, I have written a file for the first time with a header and after that, I checked the existence of the file and just appended it without the header in the file.
you can use the above code in any scenario where you need to append multiple dataframes in the same file without the header multiple times.
Upvotes: 3
Reputation: 15953
To do it efficiently, you can use one of the Merge, join, and concatenate so you have two complete dataframe (yankdf
and metsdf
), then write to csv using to_csv
as you have been doing.
Current data
Here we have 2 dataframe, one from each file:
First dataframe df
a b c
0 1 2 3
1 4 5 6
Second dataframe df2
a b c
0 7 6 8
1 9 10 11
Using append
df = df.append(df2)
The above line will result in a single df which can be written to file
a b c
0 1 2 3
1 4 5 6
0 7 6 8
1 9 10 11
In short:
append
instead of re-assigning everytimeUpvotes: 2