Reputation: 291
I am using density
function in base R to generate KDE for given data vector (1-D). Argument n
to density
function gives probability density estimate at n
uniformly spaced points. Is there a way I can get this estimate at custom list of points?
I am thinking that I want density estimate at every 0.01 percentile point, so that points are closer where they have high density, and farther when not, essentially aligning my PDF estimate to likely confidence of PDF function at that point. This x,y
collection will be stored and used later for scoring after model development.
Those familiar with Python will recognize that this functionality is available in scipy.stats.gaussian_kde.evaluate(..)
.
Upvotes: 1
Views: 263
Reputation: 37889
I think I finally realised what you meant:
Apparently, you can do it with the sm
package:
library(sm)
a <- rnorm(200)
sm.density(a, eval.points = a)$estimate #the eval.points argument is the key argument you are looking for
Output:
> sm.density(a, eval.points = a)$estimate
[1] 0.12772710 0.02405005 0.21971466 0.34392609 0.39495931 0.41543305 0.21263921 0.41537832 0.32914302 0.25565207
[11] 0.35121705 0.27957087 0.19930803 0.41556843 0.26412647 0.32067182 0.36109746 0.33580489 0.01896655 0.41557119
[21] 0.10733984 0.30202465 0.39557093 0.10097724 0.13841591 0.34892004 0.38626383 0.07735814 0.04421804 0.39630396
[31] 0.38700142 0.10177375 0.19136592 0.23634829 0.24060493 0.37283049 0.38447048 0.09277430 0.38300854 0.38747915
[41] 0.03857675 0.32614202 0.41553740 0.41109807 0.31061776 0.39805191 0.20964930 0.37428245 0.38470874 0.23212350
[51] 0.37653126 0.06947437 0.39515910 0.40319273 0.41271155 0.24758345 0.40112930 0.41331974 0.29566411 0.39992320
[61] 0.36686191 0.38990556 0.36492636 0.41281621 0.39267835 0.18448714 0.11787245 0.37712505 0.38775265 0.25030009
[71] 0.41481836 0.10236957 0.39425025 0.03873721 0.08168519 0.29775494 0.34794457 0.16554033 0.36764219 0.41370926
[81] 0.39960951 0.41306470 0.11107980 0.27943190 0.41510756 0.35634826 0.36718828 0.38085515 0.15645417 0.25692344
[91] 0.11179099 0.22799955 0.39206820 0.41408224 0.29348350 0.15890729 0.22721980 0.38384978 0.31640118 0.03881538
[101] 0.41171143 0.41045637 0.38914218 0.40399988 0.38556505 0.27724666 0.15457874 0.36044473 0.21351522 0.37943612
[111] 0.41361048 0.40028703 0.34229100 0.40435532 0.07341782 0.34523757 0.36937555 0.26855928 0.26296213 0.40373905
[121] 0.36823187 0.19218498 0.06875183 0.38383405 0.39380643 0.09261450 0.35676087 0.41512915 0.11002953 0.22801342
[131] 0.12433048 0.13365228 0.35556910 0.37120609 0.33465014 0.41476827 0.30158998 0.41148426 0.40998579 0.29686716
[141] 0.01547056 0.41461764 0.09698607 0.32942869 0.41462633 0.29495019 0.26229083 0.41170128 0.37282610 0.40987606
[151] 0.39528089 0.33079101 0.33618617 0.41054245 0.34696030 0.32505169 0.40190879 0.23373421 0.41092030 0.21069149
[161] 0.41554976 0.37161607 0.09587529 0.23982159 0.40924851 0.28586226 0.04599452 0.41419171 0.34564851 0.37681629
[171] 0.36324057 0.17955626 0.11764356 0.29102065 0.17518755 0.01631140 0.37341812 0.23681565 0.30461539 0.31454744
[181] 0.41112586 0.22881959 0.14398296 0.41454269 0.38818158 0.36846550 0.10876282 0.25267048 0.39286846 0.29270928
[191] 0.14545077 0.34880880 0.40217248 0.32896962 0.41555177 0.33089562 0.41273214 0.08808706 0.39433817 0.06765712
So, if you want the 100 quantiles (percentiles) of a:
a_quant <- quantile(a, 1:100/100)
sm.density(a_quant, eval.points = a_quant)$estimate
Output:
> sm.density(a_quant, eval.points = a_quant)$estimate
[1] 0.01582788 0.06846862 0.08129320 0.10110568 0.11177916 0.11901171 0.12623830 0.14776146 0.15867243 0.16385618
[11] 0.18902197 0.21554781 0.23069361 0.23686379 0.24233046 0.25667392 0.26330739 0.28597006 0.29260817 0.29466226
[21] 0.29658942 0.30270500 0.31182163 0.31755901 0.32583914 0.32794022 0.33748016 0.34198509 0.34471986 0.34754376
[31] 0.35899038 0.36480382 0.36772103 0.37260060 0.37961999 0.38321539 0.38542390 0.38803056 0.39019521 0.39104517
[41] 0.39212982 0.39676851 0.39976587 0.40359531 0.40668593 0.40756583 0.40814699 0.40892928 0.40930386 0.40967425
[51] 0.40999673 0.41007634 0.41024728 0.41040236 0.41045483 0.41036411 0.41021185 0.40643714 0.40443569 0.40383346
[61] 0.40081718 0.39546541 0.39325763 0.39005222 0.38777023 0.38381332 0.37944776 0.37801648 0.37636178 0.37169064
[71] 0.36890828 0.36603411 0.36374106 0.36101178 0.35648378 0.35354726 0.34896147 0.33874447 0.33061414 0.32493625
[81] 0.29767821 0.28032682 0.27688973 0.26318047 0.25014308 0.23976716 0.23292234 0.21841618 0.21472566 0.19722509
[91] 0.16976853 0.13176471 0.11215461 0.10400802 0.09752039 0.08865138 0.07374142 0.04617218 0.04121152 0.02798761
> length(sm.density(a_quant, eval.points = a_quant)$estimate)
[1] 100
And this way you can get what you need.
Sorry, I didn't realise what you were asking for at the beginning.
Hope this helps!
Upvotes: 1