Sharath
Sharath

Reputation: 2267

Create a column Status based on the conditions on different states in R

I have a data frame like this:

ID <- c(1,2,3,4,5,5,5,6,6)
States <- c(NA,NA,"All Locked","All Not Locked","All Locked","All Locked"
                   ,"All Not Locked","All Not Locked","All Not Locked")
ToolID <- c(NA,NA,"SWP","SWP","SWP","SWP","SWP","SWP","SWP")
Measurement <- c("Length","Breadth","Width","Height","Time","Time"
                   ,"Time","Mass","Mass")
Location <- c("US","US","UK","UK","US","US","US","UK","UK")

df1 <- data.frame(ID,States,ToolID,Measurement,Location)

I am trying to do some data manipulation on this data frame using the following conditions

For each ID (grouped),     
    if States = NA, then the Status = "No Status"
    if States column contains at least(count >=) 1 "All Locked", then the Status = "Lock Status"
    if States column doesn't contain (count =0)  "All Locked", then the Status = "No Lock Status"

My desired output is

  ID ToolID Measurement Location         Status
   1     NA      Length       US      No Status
   2     NA     Breadth       US      No status
   3    SWP       Width       UK    Lock Status
   4    SWP      Height       UK No Lock Status
   5    SWP        Time       US    Lock Status
   6    SWP        Mass       UK No Lock Status

I am trying to do it this way but getting the logic wrong

df1$Status <- ifelse(df1$States == NA, "No Status",
                ifelse((count(df1$States == "All Locked") >=1),
                  "Lock Status",
                  ifelse((count(df1$States == "All Locked") <1),
                    "No Lock Status", NA)))

Can someone point me in the right direction? I would like to apply to my bigger dataset and so a fast solution would help me a lot.

Upvotes: 1

Views: 245

Answers (3)

Uwe
Uwe

Reputation: 42564

The any() function is well suited for aggregation, here. Joining with a lookup table converts NA, TRUE, and FALSE, resp., into the Status values the OP expects.

The approach can be implemented in data.table syntax as well as dplyr style.

Create lookup table

This will be used by the data.table and the dplyr variants.

library(data.table)
lut <- data.table(st = c(NA, TRUE, FALSE), 
                  Status = c("No Status", "Lock Status", "No Lock Status"))

data.table version

library(data.table)
# aggregate by ID
agg <- setDT(df1)[, .(st = any(States == "All Locked")), by = ID][
  #  join with lookup table
  lut, on = "st"][, -"st"]
# join with df1 to prepend other columns
unique(df1[, -"States"])[agg, on = "ID"]
   ID ToolID Measurement Location         Status
1:  1   <NA>      Length       US      No Status
2:  2   <NA>     Breadth       US      No Status
3:  3    SWP       Width       UK    Lock Status
4:  5    SWP        Time       US    Lock Status
5:  4    SWP      Height       UK No Lock Status
6:  6    SWP        Mass       UK No Lock Status

dplyr version

library(dplyr)
agg <-df1 %>% 
  group_by(ID) %>% 
  summarize(st = any(States == "All Locked")) %>% 
  left_join(lut) %>% 
  select(-st)
df1 %>% 
  select(-States) %>%  
  unique() %>% 
  left_join(agg)
  ID ToolID Measurement Location         Status
1  1   <NA>      Length       US      No Status
2  2   <NA>     Breadth       US      No Status
3  3    SWP       Width       UK    Lock Status
4  4    SWP      Height       UK No Lock Status
5  5    SWP        Time       US    Lock Status
6  6    SWP        Mass       UK No Lock Status

Upvotes: 1

smci
smci

Reputation: 33960

Here's a short clean idiom using dplyr::case_when. First we compute Status as the summary statistic proportion of States which are "All Locked" (0..1 or NA), then we immediately recycle Status column into the corresponding string output:

df1 %>% group_by(ID) %>%

    summarize(ToolID=ToolID[1], Measurement=Measurement[1], Location=Location[1],
      Status = sum( States=="All Locked")/n() ) %>%

    mutate(Status = case_when(
      is.na(Status)         ~ "No Status",
      Status == 1           ~ "Lock Status",
      Status == 0           ~ "No Lock Status",
      between(Status, 0, 1) ~ as.character(NA) ))

Output:

     ID ToolID Measurement Location Status        
  <dbl> <fctr> <fctr>      <fctr>   <chr>         
1  1.00 NA     Length      US       No Status     
2  2.00 NA     Breadth     US       No Status     
3  3.00 SWP    Width       UK       Lock Status   
4  4.00 SWP    Height      UK       No Lock Status
5  5.00 SWP    Time        US       NA            
6  6.00 SWP    Mass        UK       No Lock Status

Upvotes: 1

akrun
akrun

Reputation: 887541

For NA elements, use is.na and dplyr::count works on data.frame/tbls.

Here, we group by 'ID', check if there is at least one "All Locked" in 'States' column then change it to "All Locked" for the entire group (Instead of using mutate to do this, change it within the group_by and add=TRUE for adding a new grouping variable along with the existing group), get the group by frequency of 'ID' and 'States' and then based on the condition, change the values in 'States'

library(dplyr)
df1 %>% 
  group_by(ID) %>%
  group_by(States = if("All Locked" %in% States) "All Locked" 
              else States, add = TRUE) %>% 
  mutate(n = n()) %>%
  ungroup %>% 
  mutate(States = c("No Lock Status", "Lock Status")[1+ 
                (States == "All Locked" & n >=1)], 
          States = replace(States, is.na(States), "No Status")) %>%
  select(-n) %>% 
  distinct

Upvotes: 1

Related Questions