Petr Petrov
Petr Petrov

Reputation: 4432

How do I append one pandas DataFrame to another?

I have a problem with appending of dataframe. I try to execute this code

df_all = pd.read_csv('data.csv', error_bad_lines=False, chunksize=1000000)
urls = pd.read_excel('url_june.xlsx')
substr = urls.url.values.tolist()
df_res = pd.DataFrame()
for df in df_all:
    for i in substr:
        res = df[df['url'].str.contains(i)]
        df_res.append(res)

And when I try to save df_res I get empty dataframe. df_all looks like

ID,"url","used_at","active_seconds"
b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:25,1
b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:31,30
f85ce4b2f8787d48edc8612b2ccaca83,"4pda.ru/forum/index.php?showtopic=634566&view=getnewpost",2015-10-01 00:01:49,2
d3b0ef7d85dbb4dbb75e8a5950bad225,"shop.mts.ru/smartfony/mts/smartfon-smart-sprint-4g-sim-lock-white.html?utm_source=admitad&utm_medium=cpa&utm_content=300&utm_campaign=gde_cpa&uid=3",2015-10-01 00:03:19,34
078d388438ebf1d4142808f58fb66c87,"market.yandex.ru/product/12675734/spec?hid=91491&track=char",2015-10-01 00:03:48,2
d3b0ef7d85dbb4dbb75e8a5950bad225,"avito.ru/yoshkar-ola/telefony/mts",2015-10-01 00:04:21,4
d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:25,1
d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:26,9

and urls looks like

url
shoppingcart.aliexpress.com/order/confirm_order
ozon.ru/?context=order_done&number=
lk.wildberries.ru/basket/orderconfirmed
lamoda.ru/checkout/onepage/success/quick
mvideo.ru/confirmation?_requestid=
eldorado.ru/personal/order.php?step=confirm

When I print res in a loop it doesn't empty. But when I try print in a loop df_res after append, it return empty dataframe. I can't find my error. How can I fix it?

Upvotes: 70

Views: 252863

Answers (3)

Ami Tavory
Ami Tavory

Reputation: 76297

If you look at the documentation for pd.DataFrame.append

Append rows of other to the end of this frame, returning a new object. Columns not in this frame are added as new columns.

(emphasis mine).

Try

df_res = df_res.append(res)

Incidentally, note that pandas isn't that efficient for creating a DataFrame by successive concatenations. You might try this, instead:

all_res = []
for df in df_all:
    for i in substr:
        res = df[df['url'].str.contains(i)]
        all_res.append(res)

df_res = pd.concat(all_res)

This first creates a list of all the parts, then creates a DataFrame from all of them once at the end.

Upvotes: 97

cs95
cs95

Reputation: 402263

Why am I getting "AttributeError: 'DataFrame' object has no attribute 'append'?

pandas >= 2.0 append has been removed, use pd.concat instead1

Starting from pandas 2.0, append has been removed from the API. It was previously deprecated in version 1.4. See the docs on Deprecations as well as this github issue that originally proposed its deprecation.

The rationale for its removal was to discourage iteratively growing DataFrames in a loop (which is what people typically use append for). This is because append makes a new copy at each stage, resulting in quadratic complexity in memory.

1. This assume you're appending one DataFrame to another. If you're appending a row to a DataFrame, the solution is slightly different - see below.


The idiomatic way to append DataFrames is to collect all your smaller DataFrames into a list, and then make one single call to pd.concat. Here's a(n oversimplified) example

df_list = []
for df in some_function_that_yields_dfs():
    df_list.append(df)

final_df = pd.concat(df_list)

Note that if you are trying to append one row at a time rather than one DataFrame at a time, the solution is even simpler.

data = []
for a, b, c from some_function_that_yields_data():
    data.append([a, b, c])

df = pd.DataFrame(data, columns=['a', 'b', 'c'])

More information in Creating an empty Pandas DataFrame, and then filling it?

Upvotes: 39

Siddharth Raj
Siddharth Raj

Reputation: 131

If we want append based on index:

df_res = pd.DataFrame(data = None, columns= df.columns)

all_res = []

d1 = df.ix[index-10:index-1,]     #it will take 10 rows before i-th index

all_res.append(d1)

df_res = pd.concat(all_res)

Upvotes: 6

Related Questions