Reputation: 67
I have to iteratively add rows to a pandas DataFrame and find this quite hard to achieve. Also performance-wise I'm not sure if this is the best approach.
So from time to time, I get data from a server and this new dataset from the server will be a new row in my pandas DataFrame.
import pandas as pd
import datetime
df = pd.DataFrame([], columns=['Timestamp', 'Value'])
# as this df will grow over time, is this a costly copy (df = df.append) or does pandas does some optimization there, or is there a better way to achieve this?
# ignore_index, as I want the index to automatically increment
df = df.append({'Timestamp': datetime.datetime.now()}, ignore_index=True)
print(df)
After one day the DataFrame will be deleted, but during this time, probably 100k times a new row with data will be added.
The goal is still to achieve this in a very efficient way, runtime-wise (memory doesn't matter too much as enough RAM is present).
Upvotes: 0
Views: 983
Reputation: 326
I tried this to compare the speed of 'append' compared to 'loc' :
import timeit
code = """
import pandas as pd
df = pd.DataFrame({'A': range(0, 6), 'B' : range(0,6)})
df= df.append({'A' : 3, 'B' : 4}, ignore_index = True)
"""
code2 = """
import pandas as pd
df = pd.DataFrame({'A': range(0, 6), 'B' : range(0,6)})
df.loc[df.index.max()+1, :] = [3, 4]
"""
elapsed_time1 = timeit.timeit(code, number = 1000)/1000
elapsed_time2 = timeit.timeit(code2, number = 1000)/1000
print('With "append" :',elapsed_time1)
print('With "loc" :' , elapsed_time2)
On my machine, I obtained these results :
With "append" : 0.001502693824000744
With "loc" : 0.0010836279180002747
Using "loc" seems to be faster.
Upvotes: 1