user8276807
user8276807

Reputation:

Creating a new variable that indicates a specific condition using two preexisting variables in a dataframe

I have an individual-level dataset with demographic information of each person. It also provides a unique household id along with other variables:

id     if_adult (>18 yrs old)     marital_status
1          1                       Single
1          1                       Single
2          1                       Married
2          1                       Married
2          0                       Married

Each household has at least one adult who is single or two adults who are either married or single. Some households also have children. I am trying to create a dummy variable called "unmarried couple" that will correctly categorize a household that has exactly two single adults. Obviously, there are duplicate rows with the same household id so I want each to be labeled correctly. Currently, the code I have is:

individual_data$`unmarried couple` <- ifelse((individual_data$if_adult == 
"1" & individual_data$id == individual_data$id) & 
individual_data$marital_status == "Single", "1","0")

But this incorrectly categorizes the single-person led households (i.e. single moms and single dads with children) as being unmarried couples. This is key - if I can figure this out then it will be accurate. To rectify this issue, I am attempting to create a new variable that indicates the total number of adults per household:

id     if_adult (>18 yrs old)     marital_status   total_adults
1          1                       Single          2
1          1                       Single          2
2          1                       Married         2
2          1                       Married         2
2          0                       Married         2

Then create my desired variable by filtering out the single-led households and setting the condition as having at least two adults

individual_data$`unmarried couple` <- ifelse((individual_data$total_adults 
== 2 & individual_data$id == individual_data$id) & 
individual_data$marital_status == "Single", "1","0")

I ultimately want it to look like this and for the rest of the data:

id     if_adult     marital_status   total_adults  unmarried couple  
1          1           Single          2             1
1          1           Single          2             1
2          1           Married         2             0    
2          1           Married         2             0
2          0           Married         2             0

Upvotes: 0

Views: 179

Answers (2)

Rachit Kinger
Rachit Kinger

Reputation: 361

===edits at the end===

If you want to stick to Base R then the following solution might work for you:

individual_data$unmarried_couples <- ifelse(individual_data$marital_status %in% c("Single", "1", "0"),
    individual_data$total_adults %/% 2,
    0)

I have used the expression total_adults %/% 2 to calculate the number of unmarried couples living in a household since I thought it might be the case that a household has more than 2 single adults living in it.

Tested on:

id     if_adult (>18 yrs old)     marital_status   total_adults
1          1                       Single          2
1          1                       Single          2
2          1                       Married         2
2          1                       Married         2
2          0                       Married         2

===edits==
Since you are struggling with the variable adults_in_household here is a completely reproducible code:

individual_data <-  data.frame(
  id = c(1,1,2,2,2,3,3),
  if_adult = c(1,1,1,1,0,0,0),
  marital_status = c("Single", "Single", "Married", "Married", "Married", "Single", "Single")
)

library(dplyr)

individual_data %>% 
  group_by(id) %>% 
  mutate(adults_in_household = sum(if_adult))

The output of this code should be:

# A tibble: 7 x 4
# Groups:   id [3]
     id if_adult marital_status adults_in_household
  <dbl>    <dbl> <fct>                        <dbl>
1     1        1 Single                           2
2     1        1 Single                           2
3     2        1 Married                          2
4     2        1 Married                          2
5     2        0 Married                          2
6     3        0 Single                           0
7     3        0 Single                           0

Hope this helps.

Upvotes: 0

MrFlick
MrFlick

Reputation: 206207

What about this using dplyr and group_by to make this a bit easier. It checks to make sure there are exactly two single adults for each id.

library(dplyr)
dd %>% 
  group_by(id) %>% 
  mutate(unmarried_couple = sum(if_adult*(marital_status=="Single"))==2,
    total_adults = sum(if_adult))

tested with

dd <- read.table(text="id     if_adult     marital_status
1          1                       Single
1          1                       Single
2          1                       Married
2          1                       Married
2          0                       Married", header=T)

Upvotes: 1

Related Questions