Reputation: 251
I have a data set of individuals with their socioeconomic scores, ranging from -6.3 to 3.5. Now I want to assign each individual to their quantiles based on their socioeconomic score.
I have a dataset named Healthdata
with two columns: Healthdata$SSE
, and Healthdata$ID
.
Eventually, I would like to get a data frame matched by their SSE quantiles.
How can I do this in R?
Upvotes: 1
Views: 2816
Reputation: 385
So let's start with a sample data set based on your description:
set.seed(315)
Healthdata <- data.frame(SSE = sample(-6.3:3.5, 21, replace=TRUE), ID = gl(7, 3))
Which gives something like this:
> Healthdata[1:15,]
SSE ID
1 -0.3 1
2 -6.3 2
3 -1.3 3
4 -3.3 4
5 -5.3 5
6 -4.3 6
7 -4.3 7
8 0.7 8
9 -4.3 9
10 -4.3 10
11 -3.3 11
12 0.7 12
13 -2.3 13
14 -3.3 14
15 0.7 15
I understand that you want a new variable which identifies the quantile group of the individual's socioeconomic status. I would do something like this:
transform(Healthdata, Q = cut(Healthdata$SSE,
breaks = quantile(Healthdata$SSE),
labels = c(1, 2, 3, 4),
include.lowest=TRUE))
To return:
SSE ID Q
1 -1.3 1 2
2 -6.3 2 1
3 -4.3 3 1
4 0.7 4 3
5 1.7 5 3
6 1.7 6 3
7 -5.3 7 1
8 1.7 8 3
9 2.7 9 4
10 -3.3 10 2
11 -1.3 11 2
12 -3.3 12 2
13 1.7 13 3
14 0.7 14 3
15 -4.3 15 1
If you want to see the upper and lower bounds for the quantile ranges, omit the labels = c(1, 2, 3, 4)
to return this instead:
SSE ID Q
1 -1.3 1 (-4.3,-1.3]
2 -6.3 2 [-6.3,-4.3]
3 -4.3 3 [-6.3,-4.3]
4 0.7 4 (-1.3,1.7]
5 1.7 5 (-1.3,1.7]
Upvotes: 2
Reputation: 81693
Here's one approach:
# an example data set
set.seed(1)
Healthdata <- data.frame(SSE = rnorm(8), ID = gl(2, 4))
transform(Healthdata, quint = ave(SSE, ID, FUN = function(x) {
quintiles <- quantile(x, seq(0, 1, .2))
cuts <- cut(x, quintiles, include.lowest = TRUE)
quintVal <- quintiles[match(cuts, levels(cuts)) + 1]
return(quintVal)
}))
# SSE ID quint
# 1 -0.6264538 1 -0.4644344
# 2 0.1836433 1 0.7482983
# 3 -0.8356286 1 -0.7101237
# 4 1.5952808 1 1.5952808
# 5 0.3295078 2 0.3610920
# 6 -0.8204684 2 -0.1304827
# 7 0.4874291 2 0.5877873
# 8 0.7383247 2 0.7383247
A simple illustration of how it works:
values <- 1:10
# [1] 1 2 3 4 5 6 7 8 9 10
quintiles <- quantile(values, seq(0, 1, .2))
# 0% 20% 40% 60% 80% 100%
# 1.0 2.8 4.6 6.4 8.2 10.0
cuts <- cut(values, quintiles, include.lowest = TRUE)
# [1] [1,2.8] [1,2.8] (2.8,4.6] (2.8,4.6]
# [5] (4.6,6.4] (4.6,6.4] (6.4,8.2] (6.4,8.2]
# [9] (8.2,10] (8.2,10]
# 5 Levels: [1,2.8] (2.8,4.6] ... (8.2,10]
quintVal <- quintiles[match(cuts, levels(cuts)) + 1]
# 20% 20% 40% 40% 60% 60% 80% 80% 100% 100%
# 2.8 2.8 4.6 4.6 6.4 6.4 8.2 8.2 10.0 10.0
Upvotes: 3