Reputation: 359
This problem contains 3 separate dataframes. df1 represents the 'Total' of products 1,2,3, containing 'value1', 'value2' df2 represents the 'Customer1' of products 1,2,3, containing 'value1', 'value2' df3 represents the 'Customer2' of products 1,2,3, containing 'value1', 'value2'
df2 & df3 are essentially subsets of df1.
I would like to create another dataframe that subtracts df2&df3 from df1 and label this df4. I want df4 to be 'remaining customers' within the 'Market' Column.
This is what I have done so far
import pandas as pd
d1 = {'Market': ['Total', 'Total','Total'], 'Product Code': [1, 2, 3],
'Value1':[10, 20, 30], 'Value2':[5, 15, 25]}
df1 = pd.DataFrame(data=d1)
df1
d2 = {'Market': ['Customer1', 'Customer1','Customer1'], 'Product Code': [1,
2, 3], 'Value1':[3, 14, 10], 'Value2':[2, 4, 6]}
df2 = pd.DataFrame(data=d2)
df2
d3 = {'Market': ['Customer2', 'Customer2','Customer2'], 'Product Code': [1,
2, 3], 'Value1':[3, 3, 4], 'Value2':[2, 6, 10]}
df3 = pd.DataFrame(data=d3)
df3
This produces the following result..
Market Product Code Value1 Value2
0 Total 1 10 5
1 Total 2 20 15
2 Total 3 30 25
Market Product Code Value1 Value2
0 Customer1 1 3 2
1 Customer1 2 14 4
2 Customer1 3 10 6
Market Product Code Value1 Value2
0 Customer2 1 3 2
1 Customer2 2 3 6
2 Customer2 3 4 10
To create df4, I try the following code and get an error 'TypeError: unsupported operand type(s) for -: 'str' and 'str'' Can anyone help?
df4 = df1-(df2+df3)
print(df4)
Upvotes: 3
Views: 647
Reputation: 323366
Maybe we can use select_dtypes
(df1.select_dtypes(exclude = 'object')
-df2.select_dtypes(exclude = 'object')
-df3.select_dtypes(exclude = 'object')).\
drop('Product Code',1).\
combine_first(df1).\
assign(Market='remaining customers')
Out[133]:
Market Product Code Value1 Value2
0 remaining customers 1.0 4 1
1 remaining customers 2.0 3 5
2 remaining customers 3.0 16 9
Upvotes: 2
Reputation: 294506
Not exactly what OP asked for but it seemed to me that this might be a better way to manage the data.
df = pd.concat([df1, df2, df3]).set_index(['Product Code', 'Market'])
formula = 'RemainingCustomers = Total - Customer1 - Customer2'
df = df.unstack().stack(0).eval(formula).unstack()
df
Market Customer1 Customer2 Total RemainingCustomers
Value1 Value2 Value1 Value2 Value1 Value2 Value1 Value2
Product Code
1 3 2 3 2 10 5 4 1
2 14 4 3 6 20 15 3 5
3 10 6 4 10 30 25 16 9
And
df['RemainingCustomers']
Value1 Value2
Product Code
1 4 1
2 3 5
3 16 9
If we insist on the requested output
df.stack(0).reset_index().query(
'Market == "RemainingCustomers"').reindex(columns=df1.columns)
Market Product Code Value1 Value2
2 RemainingCustomers 1 4 1
6 RemainingCustomers 2 3 5
10 RemainingCustomers 3 16 9
Or
df.stack(0).xs(
'RemainingCustomers', level=1, drop_level=False
).reset_index().reindex(columns=df1.columns)
Market Product Code Value1 Value2
0 RemainingCustomers 1 4 1
1 RemainingCustomers 2 3 5
2 RemainingCustomers 3 16 9
Upvotes: 3
Reputation: 164783
Here is one way:
cols = ['Value1', 'Value2']
df4 = df1[cols].subtract(df2[cols].add(df3[cols]))\
.assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})\
.sort_index(axis=1)
# Market Product Code Value1 Value2
# 0 RemainingCustomers 1 4 1
# 1 RemainingCustomers 2 3 5
# 2 RemainingCustomers 3 16 9
Explanation
df1[cols].subtract(df2[cols].add(df3[cols]))
performs calculation on specified columns only.assign(**{'Market': 'RemainingCustomers', 'Product Code': [1, 2, 3]})
adds extra columns required for results dataframe.sort_index(axis=1)
reorders columns for desired output.Upvotes: 2
Reputation: 402962
Drop Market
, set Product Code
as the index, and perform index aligned arithmetic on the product codes. After that, just reset the index and insert Market
into the result.
df1, df2, df3 = [
df.drop('Market', 1).set_index('Product Code') for df in [df1, df2, df3]
]
df4 = (df1 - (df2 + df3)).reset_index()
df4.insert(0, 'Market', 'RemainingCustomers')
Market Product Code Value1 Value2
0 RemainingCustomers 1 4 1
1 RemainingCustomers 2 3 5
2 RemainingCustomers 3 16 9
Upvotes: 3