Reputation: 113
I am importing data into R from another source (i.e., I cannot easily change the in-coming format/values).
Among the variables is one that include one or more of these possible values:
all within the same "cell" so that possible data look like:
Sample Input Data Frame (df)
df <- read.table(text =
"row lives.with.whom
1 'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.), Grandparent(s) (biological, foster, step, etc.), Brother(s) older than 18, Sister(s) older than 18, Other adults (aunts, uncles, etc.)'
2 ''
3 'Mother (biological mother, foster mother, step mother, etc.), Sister(s) older than 18'
4 'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.)'", header = T)
Within R
, how could I efficiently create rules to parse out these responses into separate columns, one column for each type of family member, so that the output would look like this:
Sample Output Data Frame
mother <- c(1,0,1,1)
father <- c(1,0,0,1)
adult.brother <- c(1,0,0,0)
adult.sister <- c(1,0,1,0)
grandparent <- c(1,0,0,0)
other.adult <- c(1,0,0,0)
output.df <- cbind(mother, father, adult.brother, adult.sister, grandparent, other.adult)
colnames(output.df) <- c("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")
output.df
Mother Father Brother Sister Grandparent Other adult
[1,] 1 1 1 1 1 1
[2,] 0 0 0 0 0 0
[3,] 1 0 0 1 0 0
[4,] 1 1 0 0 0 0
Upvotes: 0
Views: 76
Reputation: 416
I made some assumptions and tried to solve it.
library(tidyr)
library(dplyr)
# create nested lists with names of mothers and fathers for two ppl
mother <- list(list("bio_1","step_1","foster_1"), list("bio_2", "stp_2", "foster_2"))
father <- list(list("bio_1", "foster_1", "other_1"), list("bio_2", "stp_2", "foster_2"))
# convert to data frame
test_object <- data_frame(person = c(1,2),mother,father)
# print
test_object
# A tibble: 2 x 3
person mother father
<dbl> <list> <list>
1 1 <list [3]> <list [3]>
2 2 <list [3]> <list [3]>
# first unnest the lists and get to the inner list
# then convert from wide to long form data
# do another unnnest to get the actual data in the long format
test_object %>%
unnest(.) %>%
gather(data = ., key = relationship, value = name, -person) %>%
unnest() -> test_object
test_object
# A tibble: 12 x 3
person relationship name
<dbl> <chr> <chr>
1 1 mother bio_1
2 1 mother step_1
3 1 mother foster_1
4 2 mother bio_2
5 2 mother stp_2
6 2 mother foster_2
7 1 father bio_1
8 1 father foster_1
9 1 father other_1
10 2 father bio_2
11 2 father stp_2
12 2 father foster_2
Here are links to tidyverse and data.table that contain a lot packages and functions to solve most of your data-carpentry/wrangling issues.
Upvotes: 1
Reputation: 151
Try this:
rel<-list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")
for(i in 1:6){
df$i<-if_else(grepl(rel[[i]],df$lives.with.whom),1,0)
colnames(df)[i+2]<-rel[[i]]
}
Upvotes: 1
Reputation: 50668
Here is a tidyverse
option that should get you started
library(tidyverse)
rel <- list("Mother", "Father", "Brother", "Sister", "Grandparent", "Other adult")
names(rel) <- unlist(rel)
bind_cols(df[, 1, drop = F], map(rel, ~+str_detect(tolower(df[, 2]), tolower(.x))))
# row Mother Father Brother Sister Grandparent Other adult
#1 1 1 1 1 1 1 1
#2 2 0 0 0 0 0 0
#3 3 1 0 0 1 0 0
#4 4 1 1 0 0 0 0
df <- read.table(text =
"row lives.with.whom
1 'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.), Grandparent(s) (biological, foster, step, etc.), Brother(s) older than 18, Sister(s) older than 18, Other adults (aunts, uncles, etc.)'
2 ''
3 'Mother (biological mother, foster mother, step mother, etc.), Sister(s) older than 18'
4 'Mother (biological mother, foster mother, step mother, etc.), Father (biological father, foster father, step father, etc.)'", header = T)
Upvotes: 1