Reputation: 149
I want to handle survey data and I would like functions ideal_labelled
and is.missing
with the following behavior:
test <- ideal_label(c(1, NA, -1),
labels = structure(c(0, 1, -1), names = c("No", "Yes", "PNR")),
missing.values = c(NA, -1))
as.character(test[1]) # "Yes"
as.numeric(test[1]) # 1
test %in% 1 # TRUE FALSE FALSE
test == 1 # TRUE NA FALSE
test %in% "Yes" # TRUE FALSE FALSE
test == "Yes" # TRUE NA FALSE
is.na(test) # FALSE TRUE FALSE
is.missing(test) # FALSE TRUE TRUE
lm(c(T, T, T) ~ test)$rank # 2 (i.e., keeps missing values that are not NA)
df <- data.frame(test = test, true = c(T, T, T))
lm(true ~ test, data = df)$rank # 2
This used to be possible with function as.item
(and is.missing
) of package memisc
, with memisc
version 0.99.22 and R Version 4.2.1.
However, more recent versions of memisc
treat missing values the same as NA
(i.e., is.na(test[3])
returns TRUE
). And using memisc
version 0.99.22 with more recent versions of R tend to treat labelled variables as numerical rather than characters (namely, test[1] == "Yes"
returns NA
and test[1] %in% "Yes"
returns FALSE
).
I have tested other packages (haven
, labelled
, forcats
) but none of them seem to allow the behavior I need.
How do I achieve this with the latest versions of these libraries?
Upvotes: 0
Views: 55
Reputation: 114
First of all, it should be noted that the labelled
package follows the haven
classes regarded labelled data. haven
implements two types of additional missing values: SAS/Stata-like tagged NAs and SPSS-like user NAs. They are presented in detail in https://larmarange.github.io/labelled/articles/missing_values.html
What you are referring to is more similar to user NAs available through the haven_labelled_spss
class. labelled
already provides functions to distinct regular NAs from user NAs.
library(labelled)
v <- labelled_spss(
c(1, NA, -1),
labels = c(No = 0, Yes = 1, PNR = -1),
na_values = -1
)
v
#> <labelled_spss<double>[3]>
#> [1] 1 NA -1
#> Missing values: -1
#>
#> Labels:
#> value label
#> 0 No
#> 1 Yes
#> -1 PNR
is.na(v)
#> [1] FALSE TRUE TRUE
is_regular_na(v)
#> [1] FALSE TRUE FALSE
is_user_na(v)
#> [1] FALSE FALSE TRUE
Created on 2025-02-28 with reprex v2.1.1
labelled
also provides many function to convert to other formats.
library(labelled)
v <- labelled_spss(
c(1, NA, -1),
labels = c(No = 0, Yes = 1, PNR = -1),
na_values = -1
)
to_factor(v)
#> [1] Yes <NA> PNR
#> Levels: No Yes PNR
to_factor(v, user_na_to_na = TRUE)
#> [1] Yes <NA> <NA>
#> Levels: No Yes
to_character(v)
#> [1] "Yes" NA "PNR"
to_character(v, user_na_to_na = TRUE)
#> [1] "Yes" NA NA
unclass(v)
#> [1] 1 NA -1
#> attr(,"labels")
#> No Yes PNR
#> 0 1 -1
#> attr(,"na_values")
#> [1] -1
user_na_to_na(v)
#> <labelled<double>[3]>
#> [1] 1 NA NA
#>
#> Labels:
#> value label
#> 0 No
#> 1 Yes
user_na_to_na(v) |> unclass()
#> [1] 1 NA NA
#> attr(,"labels")
#> No Yes
#> 0 1
Created on 2025-02-28 with reprex v2.1.1
Regarding the comparison operators you mentioned in your stackoverflow question, you have to keep in mind that haven
allows both numeric labelled vector (i.e. 1 coded as 'Yes') and character labelled vector (i.e. "y" coded as "yes"). So the automatic distinction between numeric/character doesn't always work to distinct the proper value from the label. However, you can esaily adapt your comparison tests with conversion and/or create your custom operators.
library(labelled)
v <- labelled_spss(
c(1, NA, -1),
labels = c(No = 0, Yes = 1, PNR = -1),
na_values = -1
)
v == 1
#> [1] TRUE NA FALSE
v %in% 1
#> [1] TRUE FALSE FALSE
to_character(v) == "Yes"
#> [1] TRUE NA FALSE
to_character(v) %in% "Yes"
#> [1] TRUE FALSE FALSE
`%l=%` <- function(x, y) {
to_character(x) == y
}
`%lin%` <- function(x, y) {
to_character(x) %in% y
}
v %l=% "Yes"
#> [1] TRUE NA FALSE
v %lin% "Yes"
#> [1] TRUE FALSE FALSE
Created on 2025-02-28 with reprex v2.1.1
Upvotes: -1