Reputation: 17
I have a data set where children were asked whether they speak only German, German and another language or only another language at home. They were then asked which languages they speak at home (e.g. English, 0 - no; 1 = yes). There was also a field where they could enter other languages. This was saved as a character variable. As there were errors because children entered "only German" and then other languages, I would like to create a new variable "mig_1", which at the end distinguishes between children with and without a migration background (0 = German, 1 = with a migration background).
Here is a short sample (not all variables of the different languages are included):
kommschreib <- data.frame (code = c("013101", "013102", "013205", "114113", "014201", "053216"), mig = c(0, NA, 1, 2, 1, 0), englisch = c(0, 1, 1, 0, 0), sprache=c("niederländisch", "italienisch"))
With my code, I can already cover the case where they didn't specify anything at all in the first question (German, German and another language, just another language), but then marked another language. I have covered the case where they indicated "German only", but then ticked another language. And I have covered the "normal" case (children have only indicated German, no other language or children have indicated German and another language/only another language).
kommschreib <- kommschreib %>%
mutate(mig_2 = case_when(englisch == 1 | arabisch == 1 | kurdisch == 1 | russisch == 1 | ukrainisch == 1 | farsi == 1 | polnisch == 1 | türkisch == 1 | albanisch == 1 ~ 1,
mig == 0 & englisch == 1 ~ 1,
mig == 0 & arabisch == 1 ~ 1,
mig == 0 & kurdisch == 1 ~ 1,
mig == 0 & russisch == 1 ~ 1,
mig == 0 & ukrainisch == 1 ~ 1,
mig == 0 & farsi == 1 ~ 1,
mig == 0 & polnisch == 1 ~ 1,
mig == 0 & türkisch == 1 ~ 1,
mig == 0 & albanisch == 1 ~ 1,
mig == 0 ~ 0,
mig > 0 ~ 1) %>%
labelled(label = "Migrationshintergrund"))
kommschreib <- kommschreib %>% sjlabelled::set_labels(mig_2, labels = c("dt" = 0, "mig" = 1))
But I still need the case where they have entered "German only" or nothing at all, but then entered something as another language. I would like to have a code for the case: mig == 0 (so only German is specified) or mig == NA AND language is not empty.
I hope, I could explain what I mean.
Upvotes: 0
Views: 111
Reputation: 17
Hurra! Finally it worked.
So, I had to transform all those cases, where the children did not enter another language to NA (my character variable is called "s12"):
kommschreib <- kommschreib %>% mutate_if(is.character, list(~na_if(., "")))
Then I could add my previous code (pay attention to the order). So, I ended up with this code where all the options should be considered:
kommschreib <- kommschreib %>%
mutate(mig_2 = case_when(mig == 0 & !is.na(s12) ~ 1,
is.na(mig) & !is.na(s12) ~ 1,
englisch == 1 | arabisch == 1 | kurdisch == 1 | russisch == 1 | ukrainisch == 1 | farsi == 1 | polnisch==1 | türkisch==1 | albanisch==1 ~ 1,
mig == 0 & englisch == 1 ~ 1,
mig == 0 & arabisch == 1 ~ 1,
mig == 0 & kurdisch == 1 ~ 1,
mig == 0 & russisch == 1 ~ 1,
mig == 0 & ukrainisch == 1 ~ 1,
mig == 0 & farsi == 1 ~ 1,
mig == 0 & polnisch == 1 ~ 1,
mig == 0 & türkisch == 1 ~ 1,
mig == 0 & albanisch == 1 ~ 1,
mig == 0 ~ 0,
mig > 0 ~ 1) %>%
labelled(label = "Migration background"))
It seems to be a bit complex and I really hope I did not miss something. I checked if it did what it was supposed to do (and so far it did!).
Upvotes: 0
Reputation: 530
Suppose your dataframe is:
kommschreib <- data.frame(code = c("013101", "013102", "013205", "114113", "014201", "053216"),
mig = c(0, NA, 1, 2, 1, 0),
englisch = c(0, 0, 1, 0, 0, 1),
arabisch = c(1, 0, 0, 0, 1, 0),
kurdisch = c(0, NA, NA, 0, 1, 0))
Then If I understood correctly you can do:
kommschreib1 = kommschreib %>%
rowwise() %>%
mutate(mig2 = ifelse(mig %in% c(1, 2) | any(c_across(englisch:kurdisch) %in% 1),
1, 0))
Which will return 1
if the child speaks 'German and another language', 'only another language at home' or if at least one other language is checked (and 0 otherwise).
Upvotes: 0