Reputation: 7
I have this data set in excel column; I want to find where most of the data was clouded e.g. around 100 values of this set fell between 2018.5 to 2020.5 But it is manual work. What can be the formula for searching these value. I tried using Average & Std. Deviation but they have some flaws & not always accurate. And the range must be as narrow as possible, say if I have total number of values in the set equal to 150 then more than 60%-70% i.e 90-105 of these values should come under that determined range.
2025
2023.75
2023
2020.15
2020.3
2018.35
2017.5
2017.95
2016
2019.45
2020
2022.4
2022.25
2018.2
2018.05
2019.3
2022
2021.45
2021.45
2019.8
2020
2019.25
2019.35
2019.85
2019.4
2018.25
2019.55
2020
2020
2020.05
2020
2020
2021
2019.8
2019.8
2020
2020.05
2019.9
2021
2019.1
2019.1
2015.5
2015.5
2018.5
2017.25
2018.85
2018.85
2018.85
2017.05
2019
2020.05
2020
2020
2020
2019.7
2019.65
2019.65
2020
2019
2018.05
2020
2019.4
2020
2020.05
2017.4
2020
2020
2020
2019.95
2019.95
2019.95
2018
2019.85
2019.85
2019.7
2020
2020
2020
2020
2020
2019.75
2019.6
2019.75
2019.75
2019.95
2019.95
2019.95
2020
2020
2020
2020
2019.35
2019.35
2019.95
2019.95
2020
2019.95
2020
2020
2020
2021.7
2020
2021
2020
2020
2020
2020
2019.95
2020
2019.85
2019.8
2019
2019.8
2019.8
2019.8
2020
2019.9
2019.9
2020
2020
2019.8
2019.3
2019.85
2020
2019.1
2018
2017.5
2015.05
2009
2008
2011.5
2011
2011
2014.65
2011
2010
2009.05
2005.65
2005
1998.1
2000
1999
2001.7
2000.95
2002
2005
2006.7
2004.4
2004.85
2006.8
2007.45
2006
2010
2012.5
2012.2
2008.8
2007.4
2007.55
2005.35
2005.4
2006.9
2008.05
2003.85
2002.6
2004.85
2002.35
2006
2006
Upvotes: 0
Views: 88
Reputation: 7979
What you desire isn't possible with the worksheet functions as they are. you would need to create a table (having one dimension as lower cut and one dimension with upper cut to delete the lowest/highest till you got your range... but you would need to do so every time the size changes (and the formulas would be.... don't think about it... (still assuming you need the whole range)
You can do it by formula: (all Values in A:A)
C1: 0.7 (at least x % values need to be included / 1 = 100%)
C2: =MAX(IF(MIN(SMALL(A:A,ROUNDUP(COUNTIF(A:A,"<>")*C1,0)+ROW(INDIRECT("1:"&ROUNDDOWN(COUNTIF(A:A,"<>")*(1-C1),0)+1))-1)-SMALL(A:A,ROW(INDIRECT("1:"&ROUNDDOWN(COUNTIF(A:A,"<>")*(1-C1),0)+1))))=(SMALL(A:A,ROUNDUP(COUNTIF(A:A,"<>")*C1,0)+ROW(INDIRECT("1:"&ROUNDDOWN(COUNTIF(A:A,"<>")*(1-C1),0)+1))-1)-SMALL(A:A,ROW(INDIRECT("1:"&ROUNDDOWN(COUNTIF(A:A,"<>")*(1-C1),0)+1)))),SMALL(A:A,ROW(INDIRECT("1:"&ROUNDDOWN(COUNTIF(A:A,"<>")*(1-C1),0)+1))),0))
C3: =MAX(IF(MIN(SMALL(A:A,ROUNDUP(COUNTIF(A:A,"<>")*C1,0)+ROW(INDIRECT("1:"&ROUNDDOWN(COUNTIF(A:A,"<>")*(1-C1),0)+1))-1)-SMALL(A:A,ROW(INDIRECT("1:"&ROUNDDOWN(COUNTIF(A:A,"<>")*(1-C1),0)+1))))=(SMALL(A:A,ROUNDUP(COUNTIF(A:A,"<>")*C1,0)+ROW(INDIRECT("1:"&ROUNDDOWN(COUNTIF(A:A,"<>")*(1-C1),0)+1))-1)-SMALL(A:A,ROW(INDIRECT("1:"&ROUNDDOWN(COUNTIF(A:A,"<>")*(1-C1),0)+1)))),SMALL(A:A,ROUNDUP(COUNTIF(A:A,"<>")*C1,0)+ROW(INDIRECT("1:"&ROUNDDOWN(COUNTIF(A:A,"<>")*(1-C1),0)+1))-1)))
Formulas in
C2/C3
are array formulas and must be confirmed with Ctrl+Shift+Enter
C2
will show the lower value and C3
the upper one. You can get the number of values used with =COUNTIF(A:A,"<>")
and the count of numbers in the calculation with =ROUNDUP(COUNTIF(A:A,"<>")*C1,0)
.
But there is still the way to do so with a UDF like this:
Open up the VBA-window (hit Alt+F11). Then click at "Insert" and then "Module". Copy/paste the code below into your Module.
Option Explicit
Public Function getValRange(rng As Range, amountVal As Double) As String
Set rng = Intersect(rng, rng.Parent.UsedRange)
Dim cellsVal As Variant
Dim cellsCount As Long
For Each cellsVal In rng.Value
If IsNumeric(cellsVal) And Len(cellsVal) > 0 Then cellsCount = cellsCount + 1
Next
Dim cellsNow As Long
cellsNow = cellsCount
cellsVal = rng.Value
amountVal = amountVal / 100
Dim AverageVal As Double
Dim maxVal As Double
Dim minVal As Double
Dim valToDel As Double
Dim x As Long, y As Long
While (cellsNow / cellsCount) > amountVal
AverageVal = Application.WorksheetFunction.Average(cellsVal)
maxVal = Application.WorksheetFunction.Max(cellsVal)
minVal = Application.WorksheetFunction.Min(cellsVal)
If (maxVal - AverageVal) > (AverageVal - minVal) Then
valToDel = maxVal
Else
valToDel = minVal
End If
For x = 1 To UBound(cellsVal)
For y = 1 To UBound(cellsVal, 2)
If cellsVal(x, y) = valToDel Then
cellsVal(x, y) = ""
Exit For
End If
Next y
Next x
cellsNow = cellsNow - 1
Wend
maxVal = Application.WorksheetFunction.Max(cellsVal)
minVal = Application.WorksheetFunction.Min(cellsVal)
getValRange = minVal & " - " & maxVal
End Function
Then simply put in the cell something like this:
=getValRange(A1:A150,70)
to get the lowest and the highest values as string like 2018.2 - 2021
. (calculated for your numbers)
It simply checks for the value having the highest difference to the average and then excludes it. Repeating this till the number of cells is less or equal to the set percentage. However, going for a percentage lower then 50% it may fail. For cases like this you would need something like "brute force" (which may have long calculation-times)
To get the best solution for a set amount of values, you can use this:
Public Function getValRange(rng As Range, amountVal As Double) As String
If rng.Cells.Count < 2 Or amountVal <= 0 Or amountVal >= 100 Then Exit Function
Set rng = Intersect(rng, rng.Parent.UsedRange)
Dim cellsVal As Variant
Dim allVal() As Variant
ReDim allVal(1 To rng.Cells.Count)
Dim i As Long
'get all values in in 1d-array
For Each cellsVal In rng.Value
If IsNumeric(cellsVal) And Len(cellsVal) > 0 Then
i = i + 1
allVal(i) = cellsVal
End If
Next
ReDim Preserve allVal(1 To i)
'sort 1d-array
For i = 2 To UBound(allVal)
If allVal(i) < allVal(i - 1) Then
cellsVal = allVal(i)
allVal(i) = allVal(i - 1)
allVal(i - 1) = cellsVal
i = 1
End If
Next
'get value count to be used
amountVal = Application.RoundUp((UBound(allVal) / 100) * amountVal, 0)
Dim clsrange(1) As Variant
'get the closest range
clsrange(0) = allVal(amountVal) - allVal(1)
clsrange(1) = allVal(1) & " - " & allVal(amountVal)
For i = 2 To UBound(allVal) - amountVal
If (allVal(amountVal + i - 1) - allVal(i)) < clsrange(0) Then
clsrange(0) = allVal(amountVal + i - 1) - allVal(i)
clsrange(1) = allVal(i) & " - " & allVal(amountVal + i - 1)
End If
Next
getValRange = clsrange(1)
End Function
A completely different approach would be using the sort option. Lets assume the columns B:E
are empty. Copy column A
to B
and then sort column B
.
Get the amound of cells you want to check for (lets say 50 for now). In C1
you put =B50-B1
in D1
=B1
and in D1
comes =B50
. And then copy it down till it reaches the range with empty cells (having 150 cells, will end at C101
, autofill simply C1:E1
downs till column E
starts to get empty). Now for column C
use conditional formatting with the formula =C1=MIN(C:C)
... then check for the formatted cell, it will tell you the smallest range having the min and max in column D
and E
.
Upvotes: 1