kucb
kucb

Reputation: 15

Pandas updating certain values in large database but not matching dataframe size

I have a large database that is containing a certain number per customer per type of item. Each day there will be a lot of updates to a certain type of item of a certain customer. The database will look as following:

import pandas as pd

df = pd.DataFrame({'customer' :  ['customer1', 'customer2'], 'item1': [12, 13], 'item2' : [3, 28],'item3': [2, 1]})
df2 = pd.DataFrame({'customer' :  ['customer1', 'customer2'], 'item?': ['item1', 'item1'], 'quantity' : [2, 5]})
    customer  item1  item2  item3
0  customer1     12      3      2
1  customer2     13     28      1

    customer  item?  quantity
0  customer1  item1         2
1  customer2  item1         5

The dataframe needs to be updated by df2, where the customer is a string and the item is also a string.

I am expecting the following dataframe:

    customer  item1  item2  item3
0  customer1     14      3      2
1  customer2     18     28      1

So essentially it is df1+df1, but this could sometimes be minus as well.

I have tried the following:

customerlist = df1['customer'].tolist()
for i in customerlist:
    df1.loc[df1.customer == customerlist[i]]

But I am already running into problems. Does someone have a function or whatever that works?

Upvotes: 0

Views: 48

Answers (1)

sophocles
sophocles

Reputation: 13831

We can use 'customer' as the index in both dataframes to make sure we align them correctly. Then all we do is add a reshaped version of df2 onto df based on alignment on index (both rows and columns), and when the item or the customer is a mismatch we use the values from df:

df.set_index('customer').add(
    pd.pivot_table(
        df2,index='customer',columns='item?',values='quantity')
    ).fillna(df.set_index('customer')).astype(int)

prints:

           item1  item2  item3
customer                      
customer1     14      3      2
customer2     18     28      1

I am not sure what you mean by "sometimes be minus as well". If you mean that you want to subtract the two dataframes then use sub instead of add. If you mean there might be a minus in the values, that shouldn't affect the code because + and - will be a -

Upvotes: 2

Related Questions