Reputation: 5412
I'm trying to append a pandas DataFrame (single column) to an existing CSV, much like this post, but it's not working! Instead my column is added at the bottom of the csv, and repeated over and over (rows in csv >> size of column). Here's my code:
with open(outputPath, "a") as resultsFile:
print len(scores)
scores.to_csv(resultsFile, header=False)
print resultsFile
Terminal output:4032
<open file '/Users/alavin/nta/NAB/results/numenta/artificialWithAnomaly/numenta_art_load_balancer_spikes.csv', mode 'a' at 0x1088686f0>
Thank you in advance!
Upvotes: 8
Views: 26990
Reputation: 871
I find the solution problematic, if many columns are to be added to a large csv file iteratively.
A solution would be to accept the csv file to store a transposed dataframe. i.e. headers works as indices and vice versa.
The upside is that you don't waste computation power on insidious operations.
Here is operation times for regular appending mode, mode='a'
, and appending column approach for series with length of 5000 appended 100 times:
The downside is that you have to transpose the dataframe to get the "intended" dataframe when reading the csv for other purposes.
Code for plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
col = []
row = []
N = 100
# Append row approach
for i in range(N):
t1 = dt.datetime.now()
data = pd.DataFrame({f'col_{i}':np.random.rand(5000)}).T
data.to_csv('test_csv_data1.txt',mode='a',header=False,sep="\t")
t2 = dt.datetime.now()
row.append((t2-t1).total_seconds())
# Append col approach
pd.DataFrame({}).to_csv('test_csv_data2.txt',header=True,sep="\t")
for i in range(N):
t1 = dt.datetime.now()
data = pd.read_csv('test_csv_data2.txt',sep='\t',header=0)
data[f'col_{i}'] = np.random.rand(5000)
data.to_csv('test_csv_data2.txt',header=True,sep="\t")
t2 = dt.datetime.now()
col.append((t2-t1).total_seconds())
t = pd.DataFrame({'N appendices':[i for i in range(N)],'append row':row,'append col':col})
t = t.set_index('N appendices')
Upvotes: 3
Reputation: 20563
Like what @aus_lacy has already suggested, you just need to read the csv file into a data frame first, concatenate two data frames and write it back to the csv file:
supposed your existing data frame called df:
df_csv = pd.read_csv(outputPath, 'your settings here')
# provided that their lengths match
df_csv['to new column'] = df['from single column']
df_csv.to_csv(outputPath, 'again your settings here')
That's it.
Upvotes: 13