Reputation: 1364
The first three columns exist. I am trying to create a formula for the fourth (HH_ANALYSIS_FLAG).
ACCOUNT_NUMBER HOUSEHOLD_NUMBER ACCOUNT_ANALYSIS_FLAG HH_ANALYSIS_FLAG
1001 1 1 0
1002 2 0 0
1003 3 1 0
1004 3 0 0
1005 3 0 0
1006 2 0 0
1007 4 0 0
1008 1 1 0
I have 50,000 accounts. They are flagged as being under analysis with the ACCOUNT_ANALYSIS_FLAG column (0,1). All accounts belong to a household. Multiple accounts can belong to the same household. I need the HH_ANALYSIS_FLAG column to evaluate to true or false (0,1) if any account in the same household is under analysis. So with the above data and a working formula, my spreadsheet would look like so:
ACCOUNT_NUMBER HOUSEHOLD_NUMBER ACCOUNT_ANALYSIS_FLAG HH_ANALYSIS_FLAG
1001 1 1 1
1002 2 0 0
1003 3 1 1
1004 3 0 1
1005 3 0 1
1006 2 0 0
1007 4 0 0
1008 1 1 1
Upvotes: 1
Views: 1025
Reputation: 37
Kenneth! Try this one:
=IF(VLOOKUP(B2,$B$2:$C$9,2,0)=1,1,0)
Assuming your table starts from A1, which means Account_Number is in cell A1, and your target column "HH_ANALYSIS_FLAG" is in column D.
Hope it's helpful
Upvotes: 0
Reputation: 5802
The following formula should do the trick. In fact, it will give you the total number of accounts being analysed per household.
A B C D
1 ACC_NUM HH_NUM ACC_ANALYSIS_FLAG HH_ANALYSIS_FLAG
2 1001 1 1 =SUMIF(B$2:B$50001, B2, C$2:c$50001)
3 1002 2 0 =SUMIF(B$2:B$50001, B3, C$2:c$50001)
4 1003 3 1 =SUMIF(B$2:B$50001, B4, C$2:c$50001)
For each row this takes selects the set of rows that share the value in the ACC_NUM column (based on the row conaining the formula) and sums together the values in the corresponding ACC_ANALYSIS_FLAG columns. This gives you the total number of accounts under analysis for the given household. Compare the result to 0 if you only need to use it as a boolean value.
EDIT:
Apparently the performance of this isn't up to snuff. However, assuming the the household numbers are all colocated, it should be possible to speed things up significantly by changin to something like the following.
2 1001 1 1 =SUMIF(B2:B5, B2, C2:C5)
3 1002 2 0 =SUMIF(B2:B6, B3, C2:C6)
4 1003 2 0 =SUMIF(B2:B7, B3, C2:C7)
5 1004 2 0 =SUMIF(B2:B8, B3, C2:C8)
6 1005 2 0 =SUMIF(B3:B9, B3, C3:C9)
7 1006 2 0 =SUMIF(B4:B10, B3, C4:C10)
8 1007 2 0 =SUMIF(B5:B11, B3, C5:C11)
9 1008 2 0 =SUMIF(B6:B12, B3, C6:C12)
10 1009 2 0 =SUMIF(B7:B13, B3, C7:C13)
This assumes that there are at most 4 accounts per household, and thus limits the range of the SUMIF to the current cell +/- 3 rows.
To avoid referencing invalid cells you'll the first and last rows have to be treated as special cases. If you need to generate a single forumala for all of these cells I think it should be possible using the OFFSET
in combination with MAX
, MIN
and ROW
to generate the appropriate ranges with just a little arithmatic.
Upvotes: 4
Reputation: 1
Presuming your HOUSEHOLD_NUMBER column is column B:
=IF(SUMIF(B:B,C:C)>0,1,0)
should do it.
Upvotes: 0
Reputation: 6711
Insert another column D (you can hide it later), which is equal to the household number if it is being analyzed, and zero if it is not. The formula for D2 can be =B2*C2
. Fill column D with this formula.
Then for your HH_ANALYSIS_FLAG column, you can count the number of values in column D which match the household in column B. The formula would be like IF(COUNTIF(D:D,"="&B2)>0,1,0)
.
I'm not sure whether this approach is fast enough for the 50,000 accounts, though.
A B C D E
1 ACCOUNT_NUMBER HOUSEHOLD_NUMBER ACCOUNT_ANALYSIS_FLAG HH_UNDER_ANALYSIS HH_ANALYSIS_FLAG
2 1001 1 1 1 (=B2*C2) =IF(COUNTIF(D:D,"="&B2)>0,1,0)
3 1002 2 0 0 (=B3*C3) =IF(COUNTIF(D:D,"="&B3)>0,1,0)
4 1003 3 1 3 (=B4*C4) =IF(COUNTIF(D:D,"="&B4)>0,1,0)
Upvotes: 0