user07345
user07345

Reputation: 35

Python: updating dataframe values inside for loop

I have a dataframe (lets call it goodsdf) which consists of 240K rows and a bunch of columns:

id name description ... availability price
1001 Item A Frying pan ... 3 20.1
2031 Item B Firewood ... 0 5
3412 Item C Olive oil ... 10 12.5

Now in the next step, I'm constantly reading a stream of updated items. Those updates include among others new prices for items, which are pulled every 90 seconds. The stream I'm receiving includes also some additional 100K items which are not of interest for my store.

What I'm looking into doing is to update the dataframe with new prices. To do so I use the following (partially pseudo) code:

for entity in feed.entity:
    if entity.HasField('product_update'):
        if entity.product_update.id == goodsdf['id']:                  #pseudo
            if goodsdf['availability'] != 0:                           #pseudo
                set goodsdf['price'] == entity.product_update.price    #pseudo

From what I have read, there are several different ways for accessing values in dataframes, e.g. by using isin(), str.contains() and a couple of others. However, many of them return True and False values only. Another way I tried to solve this is by reading new prices and specific item IDs into separate dataframes, which are later merged into my original goodsdf. This in turn showed to create penalties for time and computer resources.

I'm not quite sure I fully understand the concept using nested if statements in combination with updating values in dataframes.

Upvotes: 0

Views: 66

Answers (1)

RJ Adriaansen
RJ Adriaansen

Reputation: 9619

One approach could be to extract the id and price first to create a flattened list of dicts, load this as a new dataframe, merge it with the other based on id, then replace the prices in the original df when they meet the conditions with pd.where(). I'm not sure if it's efficient enough for your use case, but at least it avoids looping through the data:

import pandas as pd

feed = {'entity1':{'product_update':{'id':1001, 'price':999}}, 'entity2':{'product_update':{'id':2031, 'price':999}}, 'entity3':{'superfluous':'test'}}
extracted_feed_data = [v for val in feed.values() if (v := val.get('product_update'))]
data = [ { "id": 1001, "name": "Item A", "description": "Frying pan", "availability": 3, "price": 20.1 }, { "id": 2031, "name": "Item B", "description": "Firewood", "availability": 0, "price": 5 }, { "id": 3412, "name": "Item C", "description": "Olive oil", "availability": 10, "price": 12.5 } ]

df_update = pd.DataFrame(extracted_feed_data)
df = pd.DataFrame(data)

merged = df.merge(df_update, on='id', how='left')
df['price'] = merged['price_y'].where((df['availability'] != 0) & (merged['price_y'].notnull()), df['price'])

Output df:

|    |   id | name   | description   |   availability |   price |
|---:|-----:|:-------|:--------------|---------------:|--------:|
|  0 | 1001 | Item A | Frying pan    |              3 |   999   |
|  1 | 2031 | Item B | Firewood      |              0 |     5   |
|  2 | 3412 | Item C | Olive oil     |             10 |    12.5 |

Upvotes: 1

Related Questions