Reputation: 57
I have the following table which represents a child, his siblings and the case they are assigned under. The resource ids represent the house where they were placed together.
child_id|sibling_id|case_id|resource_id
1 8 123 12856
1 9 123 12856
3 11 321 12555
4 12 323 10987
4 13 323 10956
6 14 156 10554
6 15 156 10554
10 16 156 10553
10 17 145 18986
10 18 145 18986
I want to create a new column placed_together
which shows a yes
or a no
for those children that were placed together based on their case_id
s. So my result should look like this
child_id|sibling_id|case_id|resource_id|placed_together
1 8 123 12856 Yes
1 9 123 12856 Yes
3 11 321 12555 No
4 12 323 10987 No
4 13 323 10956 No
6 14 156 10554 No
6 15 156 10554 No
10 16 156 10553 No
10 17 145 18986 Yes
10 18 145 18986 Yes
Any help would be appreciated. I dont know how to create an if statement based on these conditions since a case_id can be the same for a group but their resource id can be different for one of the child.
Upvotes: 2
Views: 125
Reputation: 1261
Assuming that your dataframe was named df, you can do something like this:
# create a function that defines if a child is placed together
IsPlacedTogether = function(x, y) ifelse(sum(x == y) > 1, 'Yes', 'No')
# apply this function to every child in your data
df$placed_together = sapply(df$case_id, IsPlacedTogether, df$case_id)
Upvotes: 0
Reputation: 79338
Probably using tidyverse
:
library(tidyverse)
df %>%
group_by(case_id) %>%
mutate(placedTogether = if_else(n()>1 &length(unique(child_id))==1 &
length(unique(resource_id))==1, "Yes", "No"))
# A tibble: 10 x 5
# Groups: case_id [5]
child_id sibling_id case_id resource_id placedTogether
<int> <int> <int> <int> <chr>
1 1 8 123 12856 Yes
2 1 9 123 12856 Yes
3 3 11 321 12555 No
4 4 12 323 10987 No
5 4 13 323 10956 No
6 6 14 156 10554 No
7 6 15 156 10554 No
8 10 16 156 10553 No
9 10 17 145 18986 Yes
10 10 18 145 18986 Yes
Upvotes: 1