MLPNPC
MLPNPC

Reputation: 525

Conditional Formatting of grouped Data in R

UPRN    Start.Date  End.Date  Disability
1       2006-12-20 17-NOV-17         Y
1       2006-12-20 17-NOV-17         N
2       1991-12-06                   N
2       1991-12-06                   N
3       1991-04-29 2015-04-21        N
3       2015-04-22                   Y
4       2005-02-15                   Y
4       2005-02-15                   N

I have a dataset that looks something like the above (but much bigger). I would like to create a new column called Any_Disability.

The way I want to do this is to group by UPRN, Start.Date and End.Date and if any row in that group has a disability then both rows would have a "Y" for Any_Disability.

What I've tried already is:

library(dplyr)

test3<-all_data%>%
  group_by(UPRN, Start.Date, End.Date)%>%
  mutate(Any_Disability = ifelse(Disability=="Y", "Y","N"))

But this doesn't work as it gives the below answer:

UPRN    Start.Date  End.Date  Disability  Any_Disability
1       2006-12-20 17-NOV-17         Y          Y
1       2006-12-20 17-NOV-17         N          N
2       1991-12-06                   N          N
2       1991-12-06                   N          N
3       1991-04-29 2015-04-21        N          N
3       2015-04-22                   Y          Y
4       2005-02-15                   Y          Y
4       2005-02-15                   N          N

Reproducible code:

UPRN<-c(1,1,2,2,3,3,4,4)
Start.Date<-c("2006-12-20","2006-12-20", "1991-12-06","1991-12-06","1991-04-29", "2015-04-22","2005-02-15", "2005-02-15")
End.Date<-c("17-NOV-17", "17-NOV-17", "","", "2015-04-21", "", "", "")
Disability<-c("Y","N","N","N","N","Y","Y","N")

dataset <- data.frame(UPRN, Start.Date, End.Date, Disability)

Upvotes: 0

Views: 96

Answers (2)

Ape
Ape

Reputation: 1169

Base R, using ave and 'Y' > 'N' ... TRUE

dataset_new <- within(dataset, 
    Any_Disability <- ave(Disability, paste0(UPRN, "#", Start.Date, "#", End.Date), 
        FUN = max)
    )

# UPRN Start.Date   End.Date Disability Any_Disability
#    1 2006-12-20  17-NOV-17          Y              Y
#    1 2006-12-20  17-NOV-17          N              Y
#    2 1991-12-06                     N              N
#    2 1991-12-06                     N              N
#    3 1991-04-29 2015-04-21          N              N
#    3 2015-04-22                     Y              Y
#    4 2005-02-15                     Y              Y
#    4 2005-02-15                     N              Y

You could writeFUN = function(x){if (any('Y' %in% x)) 'Y' else 'N'}, not using the 'Y' > 'N' trick, for the same result.

Upvotes: 0

BENY
BENY

Reputation: 323226

We can using any

test3<-df%>%
     group_by(UPRN, Start.Date)%>%
     dplyr::mutate(Any_Disability = ifelse(any(Disability=="Y"), "Y","N"))
test3
# A tibble: 8 x 5
# Groups:   UPRN, Start.Date [5]
   UPRN Start.Date   End.Date Disability Any_Disability
  <int>      <chr>      <chr>      <chr>          <chr>
1     1 2006-12-20  17-NOV-17          Y              Y
2     1 2006-12-20  17-NOV-17          N              Y
3     2 1991-12-06       <NA>          N              N
4     2 1991-12-06       <NA>          N              N
5     3 1991-04-29 2015-04-21          N              N
6     3 2015-04-22       <NA>          Y              Y
7     4 2005-02-15       <NA>          Y              Y
8     4 2005-02-15       <NA>          N              Y

Upvotes: 3

Related Questions