Reputation: 2267
I have a data frame like this:
ID <- c(1,2,3,4,5,5,5,6,6)
States <- c(NA,NA,"All Locked","All Not Locked","All Locked","All Locked"
,"All Not Locked","All Not Locked","All Not Locked")
ToolID <- c(NA,NA,"SWP","SWP","SWP","SWP","SWP","SWP","SWP")
Measurement <- c("Length","Breadth","Width","Height","Time","Time"
,"Time","Mass","Mass")
Location <- c("US","US","UK","UK","US","US","US","UK","UK")
df1 <- data.frame(ID,States,ToolID,Measurement,Location)
I am trying to do some data manipulation on this data frame using the following conditions
For each ID (grouped),
if States = NA, then the Status = "No Status"
if States column contains at least(count >=) 1 "All Locked", then the Status = "Lock Status"
if States column doesn't contain (count =0) "All Locked", then the Status = "No Lock Status"
My desired output is
ID ToolID Measurement Location Status
1 NA Length US No Status
2 NA Breadth US No status
3 SWP Width UK Lock Status
4 SWP Height UK No Lock Status
5 SWP Time US Lock Status
6 SWP Mass UK No Lock Status
I am trying to do it this way but getting the logic wrong
df1$Status <- ifelse(df1$States == NA, "No Status",
ifelse((count(df1$States == "All Locked") >=1),
"Lock Status",
ifelse((count(df1$States == "All Locked") <1),
"No Lock Status", NA)))
Can someone point me in the right direction? I would like to apply to my bigger dataset and so a fast solution would help me a lot.
Upvotes: 1
Views: 245
Reputation: 42564
The any()
function is well suited for aggregation, here. Joining with a lookup table converts NA
, TRUE
, and FALSE
, resp., into the Status
values the OP expects.
The approach can be implemented in data.table
syntax as well as dplyr
style.
This will be used by the data.table
and the dplyr
variants.
library(data.table)
lut <- data.table(st = c(NA, TRUE, FALSE),
Status = c("No Status", "Lock Status", "No Lock Status"))
data.table
versionlibrary(data.table)
# aggregate by ID
agg <- setDT(df1)[, .(st = any(States == "All Locked")), by = ID][
# join with lookup table
lut, on = "st"][, -"st"]
# join with df1 to prepend other columns
unique(df1[, -"States"])[agg, on = "ID"]
ID ToolID Measurement Location Status 1: 1 <NA> Length US No Status 2: 2 <NA> Breadth US No Status 3: 3 SWP Width UK Lock Status 4: 5 SWP Time US Lock Status 5: 4 SWP Height UK No Lock Status 6: 6 SWP Mass UK No Lock Status
dplyr
versionlibrary(dplyr)
agg <-df1 %>%
group_by(ID) %>%
summarize(st = any(States == "All Locked")) %>%
left_join(lut) %>%
select(-st)
df1 %>%
select(-States) %>%
unique() %>%
left_join(agg)
ID ToolID Measurement Location Status 1 1 <NA> Length US No Status 2 2 <NA> Breadth US No Status 3 3 SWP Width UK Lock Status 4 4 SWP Height UK No Lock Status 5 5 SWP Time US Lock Status 6 6 SWP Mass UK No Lock Status
Upvotes: 1
Reputation: 33960
Here's a short clean idiom using dplyr::case_when
.
First we compute Status
as the summary statistic proportion of States which are "All Locked" (0..1 or NA), then we immediately recycle Status
column into the corresponding string output:
df1 %>% group_by(ID) %>%
summarize(ToolID=ToolID[1], Measurement=Measurement[1], Location=Location[1],
Status = sum( States=="All Locked")/n() ) %>%
mutate(Status = case_when(
is.na(Status) ~ "No Status",
Status == 1 ~ "Lock Status",
Status == 0 ~ "No Lock Status",
between(Status, 0, 1) ~ as.character(NA) ))
Output:
ID ToolID Measurement Location Status
<dbl> <fctr> <fctr> <fctr> <chr>
1 1.00 NA Length US No Status
2 2.00 NA Breadth US No Status
3 3.00 SWP Width UK Lock Status
4 4.00 SWP Height UK No Lock Status
5 5.00 SWP Time US NA
6 6.00 SWP Mass UK No Lock Status
Upvotes: 1
Reputation: 887541
For NA
elements, use is.na
and dplyr::count
works on data.frame/tbl
s.
Here, we group by 'ID', check if
there is at least one "All Locked"
in 'States' column then change it to "All Locked" for the entire group (Instead of using mutate
to do this, change it within the group_by
and add=TRUE
for adding a new grouping variable along with the existing group), get the group by frequency of 'ID' and 'States' and then based on the condition, change the values in 'States'
library(dplyr)
df1 %>%
group_by(ID) %>%
group_by(States = if("All Locked" %in% States) "All Locked"
else States, add = TRUE) %>%
mutate(n = n()) %>%
ungroup %>%
mutate(States = c("No Lock Status", "Lock Status")[1+
(States == "All Locked" & n >=1)],
States = replace(States, is.na(States), "No Status")) %>%
select(-n) %>%
distinct
Upvotes: 1