Alex Reynolds
Alex Reynolds

Reputation: 96967

Counting values within levels

I have a set of levels in R that I generate with cut, e.g. say fractional values between 0 and 1, broken down into 0.1 bins:

> frac <- cut(c(0, 1), breaks=10)
> levels(frac)
[1] "(-0.001,0.1]" "(0.1,0.2]"    "(0.2,0.3]"    "(0.3,0.4]"    "(0.4,0.5]"
[6] "(0.5,0.6]"    "(0.6,0.7]"    "(0.7,0.8]"    "(0.8,0.9]"    "(0.9,1]"

Given a vector v containing continuous values between [0.0, 1.0], how do I count the frequency of elements in v that fall within each level in levels(frac)?

I could customize the number of breaks and/or the interval from which I am making levels, so I'm looking for a way to do this with standard R commands, so that I can build a two-column data frame: one column for the levels as factors, and the second column for a fractional or percentage value of total elements in v over the level.

Note: The following does not work:

> table(frac)
frac
(-0.001,0.1]    (0.1,0.2]    (0.2,0.3]    (0.3,0.4]    (0.4,0.5]    (0.5,0.6]
           1            0            0            0            0            0
   (0.6,0.7]    (0.7,0.8]    (0.8,0.9]      (0.9,1]
           0            0            0            1

If I use cut on v directly, then I do not get the same levels when I run cut on different vectors, because the range of values — their minimum and maximum — is going to be different between arbitrary vectors, and so while I may have the same number of breaks, the level intervals will not be the same.

My goal is to take different vectors and bin them to the same set of levels. Hopefully this helps clarify my question. Thanks for any assistance.

Upvotes: 2

Views: 170

Answers (5)

d.b
d.b

Reputation: 32548

frac = seq(0, 1, 0.1)
set.seed(42); v = rnorm(10, 0.5, 0.2)
sapply(1:(length(frac)-1), function(i) sum(frac[i]<v & frac[i+1]>=v))
#[1] 0 0 0 1 3 2 1 1 1 1

Upvotes: 1

zx8754
zx8754

Reputation: 56189

Introduce extremes c(0, 1) to v then use the same cut:

library(dplyr)

#dummy data
set.seed(1)
v <- round(runif(7), 2)

#result
data.frame(v,
           vFrac = cut(c(0, 1, v), breaks = 10)[-c(1, 2)]) %>% 
  group_by(vFrac) %>% 
  mutate(vFreq = n())

# Source: local data frame [10 x 3]
# Groups: vFrac [8]
# 
#        v        vFrac vFreq
#    <dbl>       <fctr> <int>
# 1   0.27    (0.2,0.3]     1
# 2   0.37    (0.3,0.4]     1
# 3   0.57    (0.5,0.6]     1
# 4   0.91      (0.9,1]     2
# 5   0.20    (0.1,0.2]     1
# 6   0.90    (0.8,0.9]     1
# 7   0.94      (0.9,1]     2

Upvotes: 2

staove7
staove7

Reputation: 580

frac = seq(0,1,by=0.1)

ranges = paste(head(frac,-1), frac[-1], sep=" - ")
freq   = hist(v, breaks=frac, include.lowest=TRUE, plot=FALSE)

data.frame(range = ranges, frequency = freq$counts)

Upvotes: 1

user3640617
user3640617

Reputation: 1576

Use findInterval instead of cut:

v<-data.frame(v=runif(100,0,1))

library(plyr)
v$x<-findInterval(v$v,seq(0,1,by=0.1))*0.1
ddply(v, .(x), summarize, n=length(x))

Upvotes: 1

Konrad Rudolph
Konrad Rudolph

Reputation: 545686

Amend frac to actually represent your desired intervals, and then use the table function:

x = runif(100) # For example.
frac = cut(x, breaks = seq(0, 1, 0.1))
table(frac)

Result:

frac
  (0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] (0.6,0.7] (0.7,0.8]
       14         9         8        10         8        12         7         7
(0.8,0.9]   (0.9,1]
       16         9

Upvotes: 2

Related Questions