Reputation: 1456
This is a typical and/or question in set analysis where I am stuck for a long time.
I want to sum the amount for those ids, where:
type_of_entry is both 'Revenue and Expense' or 'Revenue and Labor'
revenue type is 'CAF'
Expected ids are shown in bold colors
Eg...id 1 exists for both revenue and expense. Similarly, id 2 and 3 exist for both revenue and labor.
Result - > amount = 55 (5+40+10)
I have tried the below set analysis but is not working:
I would appreciate any help on this.
Regards
Sagnik
Upvotes: 1
Views: 2607
Reputation: 96
Thank's for your example, it's easier to understand.
I suggest you get the values of the concatenated id's first. For example:
Concat(<type_of_entry={'Expense','Labor'}>id, ',')
Now you can get the sum of this ids, so:
Sum(<id={"$(=Concat(<type_of_entry={'Expense','Labor'}>id, ','))"},revenue_type={'CAF'}> amount)
And this should work, I didn't verify this works with any dataset but it should!
Have a nice day!
Upvotes: 1
Reputation: 1456
Script -
The p() function extracts the possible values based on your filter, in this case, it was Expense and Labor, and * operator does the and operation. In short, you can have all the desired ids , and then apply the revenue_type filter.
Similarly, there is an e() function that extracts the excluded values.
This answer is not mine, Mr. Sunny Talwar helped me to get the solution for this question. It worked.
Upvotes: 2
Reputation: 8028
Are you accept the answer which is Python solution?
import pandas as pd
from collections import defaultdict
df = pd.DataFrame([
['Expense', 1, 10, '-'],
['Labor', 2, 20, '-'],
['Labor', 3, 50, '-'],
['Revenue', 1, 5, 'CAF'],
['Revenue', 2, 30, 'NORM'],
['Revenue', 2, 40, 'CAF'],
['Revenue', 3, 10, 'CAF'],
['Revenue', 4, 20, 'NORM'],
['Revenue', 5, 30, 'CAF']
], columns=['type_of_entry', 'id', 'amount', 'revenue_type'])
series_caf = df[df['revenue_type'].eq('CAF')]
filter_id_list = series_caf['id'].to_list() # 1, 2, 3, 5
result_amount = 0
dict_ok = defaultdict(list)
for cur_id in filter_id_list:
is_revenue = len(df[(df.id == cur_id) & (df.type_of_entry == 'Revenue')]) > 0
is_expense = len(df[(df.id == cur_id) & (df.type_of_entry == 'Expense')]) > 0
is_labor = len(df[(df.id == cur_id) & (df.type_of_entry == 'Labor')]) > 0
is_ok = (is_revenue and is_expense) or (is_revenue and is_labor)
if is_ok:
cur_amount = series_caf[series_caf.id == cur_id].amount.values[0]
result_amount += cur_amount
dict_ok['id'].append(cur_id)
dict_ok['amount'].append(cur_amount)
dict_ok['ok_reason (REL)'].append(is_revenue*100+is_expense*10+is_labor)
df_result_info = pd.DataFrame.from_dict(dict_ok)
print(df_result_info)
print(result_amount)
output
id amount ok_reason (REL)
0 1 5 110
1 2 40 101
2 3 10 101
55
Upvotes: 1