Reputation: 3331
I am trying to append a dictionary to a DataFrame object, but I get the following error:
AttributeError: 'DataFrame' object has no attribute 'append'
As far as I know, DataFrame does have the method "append".
Code snippet:
df = pd.DataFrame(df).append(new_row, ignore_index=True)
I was expecting the dictionary new_row
to be added as a new row.
How can I fix it?
Upvotes: 313
Views: 698364
Reputation: 23331
If you are enlarging a dataframe in a loop using DataFrame.append
or concat
or loc
, consider rewriting your code to enlarge a Python list and construct a dataframe once. Sometimes, you may not even need pd.concat
, you may just need a DataFrame constructor on a list of dicts.
A pretty common example of appending new rows to a dataframe is scraping data from a webpage and storing them a dataframe. In that case, instead of appending to a dataframe, literally just replace dataframe with a list and call pd.DataFrame()
or pd.concat
once at the end once. An example:
So instead of:
df = pd.DataFrame() # <--- initial dataframe (doesn't have to be empty)
for url in ticker_list:
data = pd.read_csv(url)
df = df.append(data, ignore_index=True) # <--- enlarge dataframe
use:
lst = [] # <--- initial list (doesn't have to be empty;
for url in ticker_list: # could store the initial df)
data = pd.read_csv(url)
lst.append(data) # <--- enlarge list
df = pd.concat(lst) # <--- concatenate the frames
Data reading logic could be response data from an API, data scraped from a webpage, whatever, the code refactoring is really minimal. In the above example, we assumed that lst
is a list of dataframes but if it were a list of dicts/lists etc. then we could use df = pd.DataFrame(lst)
instead in the last line of code.
That said, if a single row is to be appended to a dataframe, loc
could also do the job.
df.loc[len(df)] = new_row
With the loc
call, the dataframe is enlarged with index label len(df)
, which makes sense only if the index is RangeIndex
; RangeIndex
is created by default if an explicit index is not passed to the dataframe constructor.
A working example:
df = pd.DataFrame({'A': range(3), 'B': list('abc')})
df.loc[len(df)] = [4, 'd']
df.loc[len(df)] = {'A': 5, 'B': 'e'}
df.loc[len(df)] = pd.Series({'A': 6, 'B': 'f'})
As pointed out by @mozway, enlarging a pandas dataframe has O(n^2) complexity because in each iteration, the entire dataframe has to be read and copied. The following perfplot shows the runtime difference relative to concatenation done once.1 As you can see, both ways to enlarge a dataframe are much, much slower than enlarging a list and constructing a dataframe once (e.g. for a dataframe with 10k rows, concat
in a loop is about 800 times slower and loc
in a loop is about 1600 times slower).
1 The code used to produce the perfplot:
import pandas as pd
import perfplot
def concat_loop(lst):
df = pd.DataFrame(columns=['A', 'B'])
for dic in lst:
df = pd.concat([df, pd.DataFrame([dic])], ignore_index=True)
return df.infer_objects()
def concat_once(lst):
df = pd.DataFrame(columns=['A', 'B'])
df = pd.concat([df, pd.DataFrame(lst)], ignore_index=True)
return df.infer_objects()
def loc_loop(lst):
df = pd.DataFrame(columns=['A', 'B'])
for dic in lst:
df.loc[len(df)] = dic
return df
perfplot.plot(
setup=lambda n: [{'A': i, 'B': 'a'*(i%5+1)} for i in range(n)],
kernels=[concat_loop, concat_once, loc_loop],
labels= ['concat in a loop', 'concat once', 'loc in a loop'],
n_range=[2**k for k in range(16)],
xlabel='Length of dataframe',
title='Enlarging a dataframe in a loop',
relative_to=1,
equality_check=pd.DataFrame.equals);
Upvotes: 54
Reputation: 927
Disclaimer: this answer seems to attract popularity, but the proposed approach should not be used. append
was not changed to _append
, _append
is a private internal method and append
was removed from pandas API. The claim "The append
method in pandas look similar to list.append in Python. That's why append method in pandas is now modified to _append
." is utterly incorrect. The leading _
only means one thing: the method is private and is not intended to be used outside of pandas' internal code.
In the new version of Pandas, the append
method is changed to _append
. You can simply use _append
instead of append
, i.e., df._append(df2)
.
df = df1._append(df2,ignore_index=True)
Why is it changed?
The append
method in pandas looks similar to list.append in Python. That's why the append method in pandas is now modified to _append
.
Upvotes: 78
Reputation: 262194
As of pandas 2.0, append
(previously deprecated) was removed.
You need to use concat
instead (for most applications):
df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
As noted by @cottontail, it's also possible to use loc
, although this only works if the new index is not already present in the DataFrame (typically, this will be the case if the index is a RangeIndex
:
df.loc[len(df)] = new_row # only use with a RangeIndex!
We frequently see new users of pandas try to code like they would do it in pure Python. They use iterrows
to access items in a loop (see here why you shouldn't), or append
in a way that is similar to python list.append
.
However, as noted in pandas' issue #35407, pandas's append
and list.append
are really not the same thing. list.append
is in place, while pandas's append
creates a new DataFrame:
I think that we should deprecate Series.append and DataFrame.append. They're making an analogy to list.append, but it's a poor analogy since the behavior isn't (and can't be) in place. The data for the index and values needs to be copied to create the result.
These are also apparently popular methods. DataFrame.append is around the 10th most visited page in our API docs.
Unless I'm mistaken, users are always better off building up a list of values and passing them to the constructor, or building up a list of NDFrames followed by a single concat.
As a consequence, while list.append
is amortized O(1) at each step of the loop, pandas' append
is O(n)
, making it inefficient when repeated insertion is performed.
Using append
or concat
repeatedly is not a good idea (this has a quadratic behavior as it creates a new DataFrame for each step).
In such case, the new items should be collected in a list, and at the end of the loop converted to DataFrame
and eventually concatenated to the original DataFrame
.
lst = []
for new_row in items_generation_logic:
lst.append(new_row)
# create extension
df_extended = pd.DataFrame(lst, columns=['A', 'B', 'C'])
# or columns=df.columns if identical columns
# concatenate to original
out = pd.concat([df, df_extended])
Upvotes: 456