Reputation: 91
So I have this data in excel right now
A B C
2015-1 Test 1 23
2015-2 Test 1 12
2015-3 Test 1 43
2015-4 Test 1 32
2015-5 Test 1 3
2015-6 Test 1 90
2015-1 Test 2 200
2015-2 Test 2 123
2015-3 Test 2 21
2015-4 Test 2 40
2015-5 Test 2 17
2015-6 Test 2 138
2015-1 Test 3 160
2015-2 Test 3 55
2015-3 Test 3 30
2015-4 Test 3 74
2015-5 Test 3 67
2015-6 Test 3 89
Right now, I have it so that the user can look at the a specific time period, not necessarily all of the dates, of data, (for example, from 2015-1 to 2015-4). So when the user selects the date that they want, I want to take the percentile of the data(column C) at that date across all of the different test scenarios in column B. Right now there is only 3, but there will be up to 100 different test cases.
I know its possible to do =Percentile((test1_data,test2_data,test3_data),1), but I'm going to have to do the percentile across over 100 difference test cases, and the way I have it set up now seems highly inefficient. Is there a way to do this without having to enter in all of the 100 different arrays by hand?
Upvotes: 2
Views: 4261
Reputation: 341
Based on your table, something along the lines of the following formula should work. (It is an array formula and you should use CTL+SHIFT+ENTER as you enter the formula into the cell to activate the function.)
{=PERCENTILE(
IF(NUMBERVALUE(LEFT($A$1:$A$18,4))<=EndYear,
IF(NUMBERVALUE(LEFT($A$1:$A$18,4))>=BegYear,
IF(NUMBERVALUE(RIGHT($A$1:$A$18,1))<=EndMonth,
IF(NUMBERVALUE(RIGHT($A$1:$A$18,1))>=BegMonth,
$C$1:$C$18)))),1)}
EndYear
is a reference to the cell that has the LAST year you want includedBegYear
is a reference to the cell that has the FIRST year you want includedEndMonth
is a reference to the cell that has the LAST month (or whatever the second unit is) you want includedBegMonth
is a reference to the cell that has the FIRST month (or whatever the second unit is) you want includedJust expand the references $A$1:$A$18
and $C$1:$C$18
to include however many test cases you want.
FORMULA EXPLANATION
The first two if statements focus on the year. They take the LEFT()
four digits as a string. NUMBERVALUE()
then turns strings into values. You can then use the if statement to logically evaluate whether the test dates fall into the desired range of dates.
The second two if statements do precisely the same thing on the last single-digit (month?)
The embedded if statements, will return an array of the associated value from column C if all the statements are true and FALSE if one of the statements is not true.
PERCENTILE()
will take the array, ignore the items that returned as FALSE, and provide you with the k-th percentile of the range of values in which all four if statements are true.
*As a note, I don't know the significance of your second digit. If it ever goes above 9, you might need to adjust for your data. In that case you could either replace all the 2015-9 entries with 2015-09 and change the second argument of the RIGHT()
function to 2, or you could do something like MID($A$1:$A$18,6,2)
or the last digit could just be replaced by however many characters you have after the year argument.
Upvotes: 1