Reputation: 81
I have a continuous variable of frequency that ranges from 0 to 6.115053. I need to split that in 6 levels, my analysis will be more readable this way.
I have tried:
frequency.new <- hist(all$frequency, 6, plot = FALSE)
all$frequency <- as.factor(frequency.new)
but i get an error which i don't understand:
Error in sort.list(y) :
'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
Anybody can help me?
Thanks a lot!
Katerina
Upvotes: 2
Views: 3649
Reputation: 174853
You should look at the cut()
function in base R. You should also note the last line of my Answer (in bold) before venturing further.
> set.seed(42)
> cut(runif(50), 6)
[1] (0.825,0.99] (0.825,0.99] (0.167,0.332] (0.825,0.99]
[5] (0.496,0.661] (0.496,0.661] (0.661,0.825] (0.00296,0.167]
[9] (0.496,0.661] (0.661,0.825] (0.332,0.496] (0.661,0.825]
[13] (0.825,0.99] (0.167,0.332] (0.332,0.496] (0.825,0.99]
[17] (0.825,0.99] (0.00296,0.167] (0.332,0.496] (0.496,0.661]
[21] (0.825,0.99] (0.00296,0.167] (0.825,0.99] (0.825,0.99]
[25] (0.00296,0.167] (0.496,0.661] (0.332,0.496] (0.825,0.99]
[29] (0.332,0.496] (0.825,0.99] (0.661,0.825] (0.661,0.825]
[33] (0.332,0.496] (0.661,0.825] (0.00296,0.167] (0.825,0.99]
[37] (0.00296,0.167] (0.167,0.332] (0.825,0.99] (0.496,0.661]
[41] (0.332,0.496] (0.332,0.496] (0.00296,0.167] (0.825,0.99]
[45] (0.332,0.496] (0.825,0.99] (0.825,0.99] (0.496,0.661]
[49] (0.825,0.99] (0.496,0.661]
6 Levels: (0.00296,0.167] (0.167,0.332] (0.332,0.496] ... (0.825,0.99]
cut()
returns a factor that indexes which of the, in this case, 6 groups that observed data fall. This is just a simple splitting of the range of the data into 6 groups of equal interval. Read ?cut
for details on what to do at the extremes of the intervals.
The reason your code fails is because the object returned by hist()
is a list containing far more than your data broken in to groups:
> foo <- hist(runif(50), breaks = 6, plot = FALSE)
> str(foo)
List of 7
$ breaks : num [1:6] 0 0.2 0.4 0.6 0.8 1
$ counts : int [1:5] 12 13 7 13 5
$ intensities: num [1:5] 1.2 1.3 0.7 1.3 0.5
$ density : num [1:5] 1.2 1.3 0.7 1.3 0.5
$ mids : num [1:5] 0.1 0.3 0.5 0.7 0.9
$ xname : chr "runif(50)"
$ equidist : logi TRUE
- attr(*, "class")= chr "histogram"
so you can;t just convert this to a factor - R doesn't know how to do that. Notice also, that hist()
doesn't return the data broken down into the 6 groups - it provides other information useful for building a histogram. Also note that it will produce pretty breaks, unlike cut()
. If you want these pretty breaks, then we can reproduce what hist()
does by:
> set.seed(42)
> x <- runif(50)
> brks <- pretty(range(x), n = 6, min.n = 1)
> cut(x, breaks = brks)
[1] (0.8,1] (0.8,1] (0.2,0.4] (0.8,1] (0.6,0.8] (0.4,0.6] (0.6,0.8]
[8] (0,0.2] (0.6,0.8] (0.6,0.8] (0.4,0.6] (0.6,0.8] (0.8,1] (0.2,0.4]
[15] (0.4,0.6] (0.8,1] (0.8,1] (0,0.2] (0.4,0.6] (0.4,0.6] (0.8,1]
[22] (0,0.2] (0.8,1] (0.8,1] (0,0.2] (0.4,0.6] (0.2,0.4] (0.8,1]
[29] (0.4,0.6] (0.8,1] (0.6,0.8] (0.8,1] (0.2,0.4] (0.6,0.8] (0,0.2]
[36] (0.8,1] (0,0.2] (0.2,0.4] (0.8,1] (0.6,0.8] (0.2,0.4] (0.4,0.6]
[43] (0,0.2] (0.8,1] (0.4,0.6] (0.8,1] (0.8,1] (0.6,0.8] (0.8,1]
[50] (0.6,0.8]
Levels: (0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1]
But you should be asking yourself why you want to discretise your data so and whether this makes sense?
Upvotes: 7