Reputation: 1134
I am trying to arrange 'Smoking status' categories in alphabetical order.This shoudl be only with tidyverse.
This is what I have tried
smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
dplyr::rename('Smoking Status' = smoking_status) %>%
dplyr::arrange('Smoking status')
smoking_gender_disch_piv_count_ren
As one can see, I do not get Current smoker first, and then ex smoker, etc. I thought arrange function in dplyr will do the trick. But it does not.
This is the data I have:
structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker",
"Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"
), class = "factor"), Female = c(24.0601503759398, 9.02255639097744,
35.3383458646617, 6.01503759398496, 25.5639097744361), Male = c(34.9753694581281,
13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798
), NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053,
24.0131578947368), STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
Upvotes: 0
Views: 711
Reputation: 3326
Aside from misspelling 'Smoking Status'
as 'Smoking status'
, you ran into two other problems.
We use single ('
) or double quotes ("
) to designate strings: 'my string'
or "my string"
. However, to designate (unusual) variable names (symbols) with spaces in them, we use backticks (`
): `my variable`
. Since it's a pain to type those backticks, we typically use underscores (_
) rather than spaces in variable names.
When (re)naming columns, character
strings are as good as symbols. That is
# ... %>%
dplyr::rename('Smoking Status' = smoking_status) # %>% ...
# |--------------|
# character string
is equivalent to
# ... %>%
dplyr::rename(`Smoking Status` = smoking_status) # %>% ...
# |--------------|
# symbol
However, when performing vectorized operations with mutate()
or filter()
or arrange()
, any string will be treated as simply a scalar character
value. That is
# ... %>%
mutate(test = 'Smoking Status') # %>% ...
# |--------------|
# character string
will not copy the `Smoking Status`
column (a factor
)
# A tibble: 5 x 6
... test
... <fct>
1 ... Ex smoker
2 ... Current smoker
3 ... Never smoked
4 ... Unknown
5 ... Non smoker - smoking history unknown
but rather give you a (character
) column filled with the literal string 'Smoking Status'
:
# A tibble: 5 x 6
... test
... <chr>
1 ... Smoking Status
2 ... Smoking Status
3 ... Smoking Status
4 ... Smoking Status
5 ... Smoking Status
Similarly, your
# ... %>%
dplyr::arrange('Smoking Status')
# |----|
# Corrected typo: 'status'.
does not sort on the `Smoking Status`
column, but rather on a (temporary) column filled with the string 'Smoking Status'
. Since everything in that column is the same, no rearranging occurs at all, and the smoking_gender_disch_piv_count
dataset remains unchanged.
To fix this particular issue, use:
# ... %>%
dplyr::arrange(`Smoking Status`)
Even after fixing the issue above, you'll still have a problem. Your Smoking Status
column is a factor
[1] Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown
Levels: Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown
so when you sort on this column, it follow the ordering of the factor
levels, which are visibly not in alphabetical order.
To sort by alphabetical order, use the character
form of the `Smoking Status`
column:
# ... %>%
dplyr::arrange(as.character(`Smoking Status`))
Given the smoking_gender_disch_piv_count
dataset you reproduced
smoking_gender_disch_piv_count <-
structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker", "Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"), class = "factor"),
Female = c(24.0601503759398, 9.02255639097744, 35.3383458646617, 6.01503759398496, 25.5639097744361),
Male = c(34.9753694581281, 13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798),
NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053, 24.0131578947368),
STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625)),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
the following dplyr
workflow
smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
dplyr::rename(`Smoking Status` = smoking_status) %>%
dplyr::arrange(as.character(`Smoking Status`))
will give you your desired results for smoking_gender_disch_piv_count_ren
# A tibble: 5 x 5
`Smoking Status` Female Male NSTEMI STEMI
<fct> <dbl> <dbl> <dbl> <dbl>
1 Current smoker 9.02 13.8 12.5 6.25
2 Ex smoker 24.1 35.0 31.9 18.8
3 Never smoked 35.3 23.6 28.3 28.1
4 Non smoker - smoking history unknown 25.6 25.6 24.0 40.6
5 Unknown 6.02 1.97 3.29 6.25
while still preserving the factor
information in `Smoking Status`
.
Upvotes: 2