Reputation:
I have an individual-level dataset with demographic information of each person. It also provides a unique household id along with other variables:
id if_adult (>18 yrs old) marital_status
1 1 Single
1 1 Single
2 1 Married
2 1 Married
2 0 Married
Each household has at least one adult who is single or two adults who are either married or single. Some households also have children. I am trying to create a dummy variable called "unmarried couple" that will correctly categorize a household that has exactly two single adults. Obviously, there are duplicate rows with the same household id so I want each to be labeled correctly. Currently, the code I have is:
individual_data$`unmarried couple` <- ifelse((individual_data$if_adult ==
"1" & individual_data$id == individual_data$id) &
individual_data$marital_status == "Single", "1","0")
But this incorrectly categorizes the single-person led households (i.e. single moms and single dads with children) as being unmarried couples. This is key - if I can figure this out then it will be accurate. To rectify this issue, I am attempting to create a new variable that indicates the total number of adults per household:
id if_adult (>18 yrs old) marital_status total_adults
1 1 Single 2
1 1 Single 2
2 1 Married 2
2 1 Married 2
2 0 Married 2
Then create my desired variable by filtering out the single-led households and setting the condition as having at least two adults
individual_data$`unmarried couple` <- ifelse((individual_data$total_adults
== 2 & individual_data$id == individual_data$id) &
individual_data$marital_status == "Single", "1","0")
I ultimately want it to look like this and for the rest of the data:
id if_adult marital_status total_adults unmarried couple
1 1 Single 2 1
1 1 Single 2 1
2 1 Married 2 0
2 1 Married 2 0
2 0 Married 2 0
Upvotes: 0
Views: 179
Reputation: 361
===edits at the end===
If you want to stick to Base R then the following solution might work for you:
individual_data$unmarried_couples <- ifelse(individual_data$marital_status %in% c("Single", "1", "0"),
individual_data$total_adults %/% 2,
0)
I have used the expression total_adults %/% 2
to calculate the number of unmarried couples living in a household since I thought it might be the case that a household has more than 2 single adults living in it.
Tested on:
id if_adult (>18 yrs old) marital_status total_adults
1 1 Single 2
1 1 Single 2
2 1 Married 2
2 1 Married 2
2 0 Married 2
===edits==
Since you are struggling with the variable adults_in_household
here is a completely reproducible code:
individual_data <- data.frame(
id = c(1,1,2,2,2,3,3),
if_adult = c(1,1,1,1,0,0,0),
marital_status = c("Single", "Single", "Married", "Married", "Married", "Single", "Single")
)
library(dplyr)
individual_data %>%
group_by(id) %>%
mutate(adults_in_household = sum(if_adult))
The output of this code should be:
# A tibble: 7 x 4
# Groups: id [3]
id if_adult marital_status adults_in_household
<dbl> <dbl> <fct> <dbl>
1 1 1 Single 2
2 1 1 Single 2
3 2 1 Married 2
4 2 1 Married 2
5 2 0 Married 2
6 3 0 Single 0
7 3 0 Single 0
Hope this helps.
Upvotes: 0
Reputation: 206207
What about this using dplyr
and group_by
to make this a bit easier. It checks to make sure there are exactly two single adults for each id.
library(dplyr)
dd %>%
group_by(id) %>%
mutate(unmarried_couple = sum(if_adult*(marital_status=="Single"))==2,
total_adults = sum(if_adult))
tested with
dd <- read.table(text="id if_adult marital_status
1 1 Single
1 1 Single
2 1 Married
2 1 Married
2 0 Married", header=T)
Upvotes: 1