Reputation: 4432
I have a problem with appending of dataframe. I try to execute this code
df_all = pd.read_csv('data.csv', error_bad_lines=False, chunksize=1000000)
urls = pd.read_excel('url_june.xlsx')
substr = urls.url.values.tolist()
df_res = pd.DataFrame()
for df in df_all:
for i in substr:
res = df[df['url'].str.contains(i)]
df_res.append(res)
And when I try to save df_res
I get empty dataframe.
df_all
looks like
ID,"url","used_at","active_seconds"
b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:25,1
b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:31,30
f85ce4b2f8787d48edc8612b2ccaca83,"4pda.ru/forum/index.php?showtopic=634566&view=getnewpost",2015-10-01 00:01:49,2
d3b0ef7d85dbb4dbb75e8a5950bad225,"shop.mts.ru/smartfony/mts/smartfon-smart-sprint-4g-sim-lock-white.html?utm_source=admitad&utm_medium=cpa&utm_content=300&utm_campaign=gde_cpa&uid=3",2015-10-01 00:03:19,34
078d388438ebf1d4142808f58fb66c87,"market.yandex.ru/product/12675734/spec?hid=91491&track=char",2015-10-01 00:03:48,2
d3b0ef7d85dbb4dbb75e8a5950bad225,"avito.ru/yoshkar-ola/telefony/mts",2015-10-01 00:04:21,4
d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:25,1
d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:26,9
and urls
looks like
url
shoppingcart.aliexpress.com/order/confirm_order
ozon.ru/?context=order_done&number=
lk.wildberries.ru/basket/orderconfirmed
lamoda.ru/checkout/onepage/success/quick
mvideo.ru/confirmation?_requestid=
eldorado.ru/personal/order.php?step=confirm
When I print res
in a loop it doesn't empty. But when I try print in a loop df_res
after append, it return empty dataframe.
I can't find my error. How can I fix it?
Upvotes: 70
Views: 252863
Reputation: 76297
If you look at the documentation for pd.DataFrame.append
Append rows of other to the end of this frame, returning a new object. Columns not in this frame are added as new columns.
(emphasis mine).
Try
df_res = df_res.append(res)
Incidentally, note that pandas isn't that efficient for creating a DataFrame by successive concatenations. You might try this, instead:
all_res = []
for df in df_all:
for i in substr:
res = df[df['url'].str.contains(i)]
all_res.append(res)
df_res = pd.concat(all_res)
This first creates a list of all the parts, then creates a DataFrame from all of them once at the end.
Upvotes: 97
Reputation: 402263
Why am I getting "AttributeError: 'DataFrame' object has no attribute 'append'?
append
has been removed, use pd.concat
instead1Starting from pandas 2.0, append
has been removed from the API. It was previously deprecated in version 1.4. See the docs on Deprecations as well as this github issue that originally proposed its deprecation.
The rationale for its removal was to discourage iteratively growing DataFrames in a loop (which is what people typically use append for). This is because append makes a new copy at each stage, resulting in quadratic complexity in memory.
1. This assume you're appending one DataFrame to another. If you're appending a row to a DataFrame, the solution is slightly different - see below.
The idiomatic way to append DataFrames is to collect all your smaller DataFrames into a list, and then make one single call to pd.concat
. Here's a(n oversimplified) example
df_list = []
for df in some_function_that_yields_dfs():
df_list.append(df)
final_df = pd.concat(df_list)
Note that if you are trying to append one row at a time rather than one DataFrame at a time, the solution is even simpler.
data = []
for a, b, c from some_function_that_yields_data():
data.append([a, b, c])
df = pd.DataFrame(data, columns=['a', 'b', 'c'])
More information in Creating an empty Pandas DataFrame, and then filling it?
Upvotes: 39
Reputation: 131
df_res = pd.DataFrame(data = None, columns= df.columns)
all_res = []
d1 = df.ix[index-10:index-1,] #it will take 10 rows before i-th index
all_res.append(d1)
df_res = pd.concat(all_res)
Upvotes: 6