Reputation: 11
I am trying to a factorize a variable which is basically a count of how many food species one has eaten. My code is bring out an error and I am not sure how to fix this.
Totals is continous variable with the food count
Foods$count_quintile <- factor(Foods$Totals, levels = 1:5,
labels = c("Q1", "Q2", "Q3", "Q4", "Q5"))
After I run this, count_quintile is still turning up empty.
Thoughts?
Upvotes: 0
Views: 54
Reputation: 72593
factor
isn't quite the right function for this. Obviously you are looking for cut()
in which you can specify quintiles using quantile()
or whatever for the breaks=
.
Foods <- transform(Foods,
count_quintile=cut(Totals,
breaks=quantile(Totals, seq.int(0, 1, length.out=6)),
include.lowest=TRUE,
labels=paste0('Q', 1:5)))
str(Foods$count_quintile)
# Factor w/ 5 levels "Q1","Q2","Q3",..: 5 3 4 3 1 1 5 5 3 1 ...
head(Foods)
# foo Totals bar count_quintile
# 1 1.3709584 17 0.5131505 Q5
# 2 -0.5646982 11 0.4687138 Q3
# 3 0.3631284 13 0.4058770 Q4
# 4 0.6328626 11 0.7304523 Q3
# 5 0.4042683 6 0.6039375 Q1
# 6 -0.1061245 6 0.8713164 Q1
We can do a cross-check:
with(Foods, tapply(Totals, count_quintile, max))
# Q1 Q2 Q3 Q4 Q5
# 7 9 11 13 21
with(Foods, quantile(Totals, seq.int(0, 1, length.out=6)))
# 0% 20% 40% 60% 80% 100%
# 2 7 9 11 13 21
Data:
Foods <- n <- 1000; set.seed(42); Foods <- data.frame(foo=rnorm(n), Totals=rpois(n, 10), bar=runif(n))
Upvotes: 0
Reputation: 12518
You need to round the Totals column data, so it fits into the bins of the factor variable you're trying to create.
In your question you stated that "Totals is continous variable with the food count". Creating sample data which is continuous:
set.seed(0)
Foods <- data.frame(
Totals = rnorm(7, 2.5, 1)
)
Totals
1 3.762954
2 2.173767
3 3.829799
4 3.772429
5 2.914641
6 0.960050
7 1.571433
As you can see, every value is some fraction between 0 and 5, with every number having so many decimal places that it is very very unlikely for any of them to be a round number exactly.
In factor(Foods$Totals, levels = 1:5, labels = c("Q1", "Q2", "Q3", "Q4", "Q5"))
, the levels = 1:5
part is saying that the input data is coded with the numbers 1 to 5; the first category is 1
, the second is 2
, third 3
, etc.
So when it looks at the Total column data we see above, and it doesn't see any of those values, it returns NA. Then it repeats, creating a column of NAs.
To make it work, (assuming rounding the Total is appropriate for your data!) you can round to the nearest value, and then the code will work:
Foods$count_quintile <- factor(round(Foods$Totals), levels = 1:5, labels = c("Q1", "Q2", "Q3", "Q4", "Q5"))
Totals count_quintile
1 3.762954 Q4
2 2.173767 Q2
3 3.829799 Q4
4 3.772429 Q4
5 2.914641 Q3
6 0.960050 Q1
7 1.571433 Q2
Upvotes: 0