Reputation: 359

Subtracting values of attributes within one Pandas Dataframe from another dataframe

This problem contains 3 separate dataframes. df1 represents the 'Total' of products 1,2,3, containing 'value1', 'value2' df2 represents the 'Customer1' of products 1,2,3, containing 'value1', 'value2' df3 represents the 'Customer2' of products 1,2,3, containing 'value1', 'value2'

df2 & df3 are essentially subsets of df1.

I would like to create another dataframe that subtracts df2&df3 from df1 and label this df4. I want df4 to be 'remaining customers' within the 'Market' Column.

This is what I have done so far

import pandas as pd


d1 = {'Market': ['Total', 'Total','Total'], 'Product Code': [1, 2, 3], 
'Value1':[10, 20, 30], 'Value2':[5, 15, 25]}
df1 = pd.DataFrame(data=d1)
df1



d2 = {'Market': ['Customer1', 'Customer1','Customer1'], 'Product Code': [1, 
2, 3], 'Value1':[3, 14, 10], 'Value2':[2, 4, 6]}
df2 = pd.DataFrame(data=d2)
df2


d3 = {'Market': ['Customer2', 'Customer2','Customer2'], 'Product Code': [1, 
2, 3], 'Value1':[3, 3, 4], 'Value2':[2, 6, 10]}
df3 = pd.DataFrame(data=d3)
df3

This produces the following result..

Market  Product Code  Value1  Value2
0  Total             1      10       5
1  Total             2      20      15
2  Total             3      30      25
  Market  Product Code  Value1  Value2
0  Customer1             1       3       2
1  Customer1             2      14       4
2  Customer1             3      10       6
  Market  Product Code  Value1  Value2
0  Customer2             1       3       2
1  Customer2             2       3       6
2  Customer2             3       4      10

To create df4, I try the following code and get an error 'TypeError: unsupported operand type(s) for -: 'str' and 'str'' Can anyone help?

df4 = df1-(df2+df3)

print(df4)

Upvotes: 3

Answers (4)

BENY

Reputation: 323366

Maybe we can use select_dtypes

(df1.select_dtypes(exclude = 'object')
     -df2.select_dtypes(exclude = 'object')
       -df3.select_dtypes(exclude = 'object')).\
            drop('Product Code',1).\
              combine_first(df1).\
               assign(Market='remaining customers')
Out[133]: 
                Market  Product Code  Value1  Value2
0  remaining customers           1.0       4       1
1  remaining customers           2.0       3       5
2  remaining customers           3.0      16       9

Upvotes: 2

piRSquared

Reputation: 294506

Not exactly what OP asked for but it seemed to me that this might be a better way to manage the data.

df = pd.concat([df1, df2, df3]).set_index(['Product Code', 'Market'])

formula = 'RemainingCustomers = Total - Customer1 - Customer2'
df = df.unstack().stack(0).eval(formula).unstack()
df

Market       Customer1        Customer2         Total        RemainingCustomers       
                Value1 Value2    Value1 Value2 Value1 Value2             Value1 Value2
Product Code                                                                          
1                    3      2         3      2     10      5                  4      1
2                   14      4         3      6     20     15                  3      5
3                   10      6         4     10     30     25                 16      9

And

df['RemainingCustomers']

              Value1  Value2
Product Code                
1                  4       1
2                  3       5
3                 16       9

If we insist on the requested output

df.stack(0).reset_index().query(
    'Market == "RemainingCustomers"').reindex(columns=df1.columns)

                Market  Product Code  Value1  Value2
2   RemainingCustomers             1       4       1
6   RemainingCustomers             2       3       5
10  RemainingCustomers             3      16       9

df.stack(0).xs(
    'RemainingCustomers', level=1, drop_level=False
).reset_index().reindex(columns=df1.columns)

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

Upvotes: 3

jpp

Reputation: 164783

Here is one way:

cols = ['Value1', 'Value2']
df4 = df1[cols].subtract(df2[cols].add(df3[cols]))\
               .assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})\
               .sort_index(axis=1)

#                Market  Product Code  Value1  Value2
# 0  RemainingCustomers             1       4       1
# 1  RemainingCustomers             2       3       5
# 2  RemainingCustomers             3      16       9

Explanation

df1[cols].subtract(df2[cols].add(df3[cols])) performs calculation on specified columns only.
assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]}) adds extra columns required for results dataframe.
sort_index(axis=1) reorders columns for desired output.

Upvotes: 2

cs95

Reputation: 402962

Drop Market, set Product Code as the index, and perform index aligned arithmetic on the product codes. After that, just reset the index and insert Market into the result.

df1, df2, df3 = [
      df.drop('Market', 1).set_index('Product Code') for df in [df1, df2, df3]
]

df4 = (df1 - (df2 + df3)).reset_index()
df4.insert(0, 'Market', 'RemainingCustomers')

               Market  Product Code  Value1  Value2
0  RemainingCustomers             1       4       1
1  RemainingCustomers             2       3       5
2  RemainingCustomers             3      16       9

Upvotes: 3

Subtracting values of attributes within one Pandas Dataframe from another dataframe

Answers (4)

Related Questions