Reputation: 484
I'm looking for an efficient way to repeatedly update rows in a DataFrame. By this I mean changing row values and its index label. I specifically need help with the latter. The best I could find is How to change Pandas dataframe index value? However, this updates the entire index, while I care about a single index label.
Directly assigning to index[n] is not supported:
>>> df.index[1] = 'new_label'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 1374, in __setitem__
raise TypeError("Index does not support mutable operations")
Would it work to modify (assign to) the numpy array underlying the index?
>>> df.index._values[1] = 'new_label'
The updates don't break index sortedness.
More context:
I have a DataFrame indexed by timestamps (DatetimeIndex) where I need to efficiently append new rows in real time (many times per second). I preallocate a large fixed-size DataFrame with NaT/NaN s and I append rows by writing into the next empty row.
Upvotes: 2
Views: 1223
Reputation: 862851
As Steven G pointed, this update is very unefficient.
Better is create list of DataFrames
and then use concat
:
df1 = pd.DataFrame({'a': [1,2,3]}, index=pd.date_range('2015-01-01', periods=3))
print (df1)
a
2015-01-01 1
2015-01-02 2
2015-01-03 3
df2 = pd.DataFrame({'a': [7,8,9]}, index=pd.date_range('2016-01-01', periods=3))
print (df2)
a
2016-01-01 7
2016-01-02 8
2016-01-03 9
dfs = [df1,df2]
df = pd.concat(dfs)
print (df)
a
2015-01-01 1
2015-01-02 2
2015-01-03 3
2016-01-01 7
2016-01-02 8
2016-01-03 9
Then you can concat
another DataFrame
:
df3 = pd.DataFrame({'a': [3,2,5]}, index=pd.date_range('2017-01-01', periods=3))
print (df3)
a
2017-01-01 3
2017-01-02 2
2017-01-03 5
df = pd.concat([df, df3])
print (df)
a
2015-01-01 1
2015-01-02 2
2015-01-03 3
2016-01-01 7
2016-01-02 8
2016-01-03 9
2017-01-01 3
2017-01-02 2
2017-01-03 5
Or use DataFrame.append
:
df4 = pd.DataFrame({'a': [3,2,4]}, index=pd.date_range('2018-01-01', periods=3))
print (df4)
a
2018-01-01 3
2018-01-02 2
2018-01-03 4
df = df.append(df4)
print (df)
a
2015-01-01 1
2015-01-02 2
2015-01-03 3
2016-01-01 7
2016-01-02 8
2016-01-03 9
2017-01-01 3
2017-01-02 2
2017-01-03 5
2018-01-01 3
2018-01-02 2
2018-01-03 4
Solution for replacing value in index:
You can use very similar as df.index._values[1] = 'new_label'
only remove _
and another solution is with Index.set_value
:
df = pd.DataFrame({'a': [1,2,3]}, index=pd.date_range('2016-01-01', periods=3))
print (df)
a
2016-01-01 1
2016-01-02 2
2016-01-03 3
df.index.values[0] = pd.Timestamp(2016,11,23,1,0,0)
df.index.set_value(df.index, df.index[1], pd.Timestamp(2016,11,22,1,0,0))
print (df)
a
2016-11-23 01:00:00 1
2016-11-22 01:00:00 2
2016-01-03 00:00:00 3
Upvotes: 1
Reputation: 17132
updating a dataframe
multiple time per second is not very efficient. you should append a list and then transform the list to a dataframe afterward.
such has:
log=list()
for i in range(1,10): # here is your loop every minutes where log is updated
log.append([date, value1, value2])
df = pd.DataFrame(log) # now you create the dataframe after being done updating the list.
Upvotes: 1