Reputation: 1191
I have a DataFrame with an index of type datetime objects. I am ultimately going to write this DataFrame to an HDF5 file using HDFStore.append. I am adding a lot of rows that need to be written to this HDF5 file. If i use HDFStore.append for every row, this takes way too long. If I collect everything in a DataFrame first, I run out of memory. So I need to chunk and write to HDF5 intermittently.
df = DataFrame([['Bob','Mary']], columns=['Boy', 'Girl'], index=[datetime.today()])
Now i would like to add another row to this WITH THE SAME INDEX
row = ['John', 'Sue']
Using .loc or .ix replaces the existing row
df.loc[datetime.today()] = row
Using append works, but for my purposes is WAY TOO SLOW
new_df = DataFrame([row], columns=df.columns, index=[datetime.today()])
df.append(new_df)
Is there a better way to do this ?
Upvotes: 1
Views: 1290
Reputation: 16144
Create a list of lists and making a dataframe of that will be faster than append. Since you are already creating data frames of small chunks, why not create it in one go:
In [1303]: pd.DataFrame([[0,1], [1,2], [2,3]], index=[pd.datetime.today()] * 3)
Out[1303]:
0 1
2015-05-07 09:02:30.327473 0 1
2015-05-07 09:02:30.327473 1 2
2015-05-07 09:02:30.327473 2 3
Upvotes: 1