William Anderson
William Anderson

Reputation: 67

How to iteratively add rows to an inital empty pandas Dataframe?

I have to iteratively add rows to a pandas DataFrame and find this quite hard to achieve. Also performance-wise I'm not sure if this is the best approach.

So from time to time, I get data from a server and this new dataset from the server will be a new row in my pandas DataFrame.

import pandas as pd
import datetime

df = pd.DataFrame([], columns=['Timestamp', 'Value'])

# as this df will grow over time, is this a costly copy (df = df.append) or does pandas does some optimization there, or is there a better way to achieve this?
# ignore_index, as I want the index to automatically increment
df = df.append({'Timestamp': datetime.datetime.now()}, ignore_index=True)
print(df)

After one day the DataFrame will be deleted, but during this time, probably 100k times a new row with data will be added.

The goal is still to achieve this in a very efficient way, runtime-wise (memory doesn't matter too much as enough RAM is present).

Upvotes: 0

Views: 983

Answers (1)

jlesueur
jlesueur

Reputation: 326

I tried this to compare the speed of 'append' compared to 'loc' :

import timeit

code = """
import pandas as pd
df = pd.DataFrame({'A': range(0, 6), 'B' : range(0,6)})
df= df.append({'A' : 3, 'B' : 4}, ignore_index = True)
"""

code2 = """
import pandas as pd
df = pd.DataFrame({'A': range(0, 6), 'B' : range(0,6)})
df.loc[df.index.max()+1, :] = [3, 4]
"""

elapsed_time1 = timeit.timeit(code, number = 1000)/1000
elapsed_time2 = timeit.timeit(code2, number = 1000)/1000
print('With "append" :',elapsed_time1)
print('With "loc" :' , elapsed_time2)

On my machine, I obtained these results :

With "append" : 0.001502693824000744
With "loc" : 0.0010836279180002747

Using "loc" seems to be faster.

Upvotes: 1

Related Questions