AndyApps
AndyApps

Reputation: 57

R Programming issue intervals

I'm trying to figure out a formula to be able to divide the max and min number inside the intervals.

x <- sample(10:40,100,rep=TRUE)
factorx<- factor(cut(x, breaks=nclass.Sturges(x)))
xout<-as.data.frame(table(factorx))
xout<- transform(xout, cumFreq = cumsum(Freq), relative = prop.table(Freq))

Using the above code in the R editor program, I get the following:

xout
      factorx Freq cumFreq relative
1 (9.97,13.8]   14      14     0.14
2 (13.8,17.5]   13      27     0.13
3 (17.5,21.2]   16      43     0.16
4   (21.2,25]    5      48     0.05
5   (25,28.8]   11      59     0.11
6 (28.8,32.5]    8      67     0.08
7 (32.5,36.2]   16      83     0.16
8   (36.2,40]   17     100     0.17

What I want to know is if there is a way to calculate the interval. For example it would be:

(13.8 + 9.97)/2

It's called the class midpoint in statistics I believe.

Upvotes: 1

Views: 78

Answers (2)

Metrics
Metrics

Reputation: 15458

#One possible solution is to split by (,] (xout is your dataframe)

x1<-strsplit(as.character(xout$factorx),",|\\(|]")
x2<-do.call(rbind,x1)
xout$lower=as.numeric(x2[,2])
xout$higher=as.numeric(x2[,3])
xout$ave<-rowMeans(xout[,c("lower","higher")])

> head(xout,3)
      factorx Freq cumFreq relative higher lower   aver
1 (9.97,13.7]   15      15     0.15   13.7  9.97 11.835
2 (13.7,17.5]   14      29     0.14   17.5 13.70 15.600
3 (17.5,21.2]   12      41     0.12   21.2 17.50 19.350

Upvotes: 1

Thomas
Thomas

Reputation: 44525

Here's a one-liner that is probably close to what you want:

> sapply(strsplit(levels(xout$factorx), ","), function(x) sum(as.numeric(gsub("[[:space:]]", "", chartr(old = "(]", new = "  ", x))))/2)
[1] 11.885 15.650 19.350 23.100 26.900 30.650 34.350 38.100

Upvotes: 2

Related Questions