Reputation: 81
I want to count the number of death before first breast cancer, after first breast cancer, after second breast cancer etc...
my data looks like this, when you see the EVENT column you can see some people has death event before BC1(first breast cancer), after BC1 or after BC2. I was wondering how to count the number of people in this each sequence
PERSON_ID EVENT
3 10000000002 BC1
4 10000000002 R_B
5 10000000002 BC2
6 10000000002 DEATH
7 10000000002 EPI
20 10000000007 BC1
21 10000000007 BC2
22 10000000007 DEATH
23 10000000010 DEATH
37 10000000827 DEATH
38 10000000830 BC1
39 10000000830 DEATH
I want some tables like this, but I dont worry about making tables now. I just want the corresponding count for
Death before BC1: #
Death after BC1: #
Death after BC2: #
sorry for bad formatting in the post, any help will be appreciated!
Upvotes: 0
Views: 89
Reputation: 341
I think we can assume that no person gets breast cancer after they die so you can check if a person has BC2 as an EVENT and if they do they died after the second time the caught cancer.
library("dplyr")
df <- data.frame(PERSON_ID = c(10000000002, 10000000002, 10000000002,
10000000002,
10000000002, 10000000007, 10000000007, 10000000007,
10000000010, 10000000827, 10000000830, 10000000830),
EVENT = c("BC1", "R_B", "BC2", "DEATH",
"EPI", "BC1", "BC2", "DEATH",
"DEATH", "DEATH", "BC1", "DEATH" ))
group_by(df, PERSON_ID) %>%
summarise(Type = ifelse("BC2" %in% EVENT, "BC2",
ifelse("BC1" %in% EVENT, "BC1",
"BC0"))) %>%
ungroup() %>%
group_by(Type) %>%
summarise(Count = n())
Cheers
Upvotes: 1
Reputation: 507
Here is a very simple solution. The results are stored into the results
variable.
my_data<- data.frame(PERSON_ID = as.character(c(10000000002,10000000002,10000000002,10000000002,10000000002,10000000007,10000000007,10000000007,10000000010,10000000827,10000000830,10000000830)),
EVENT = c("BC1","R_B","BC2","DEATH","EPI","BC1","BC2","DEATH","DEATH","DEATH","BC1","DEATH"))
my_function <- function(ID){
person <- subset(my_data, PERSON_ID == ID)
a <- which(person $EVENT == "DEATH")
b <- which(person $EVENT == "BC1")
c <- which(person $EVENT == "BC2")
if(length(b) == 0){return("Death_before_BC1")}
else if(length(c) == 0){return("Death_after_BC1")}
else{return("Death_after_BC2")}
}
results_tmp <- sapply(as.character(unique(my_data$PERSON_ID)), my_function)
results <- data.frame(Death_before_BC1 = sum(results_tmp == "Death_before_BC1"),
Death_after_BC1 = sum(results_tmp == "Death_after_BC1"),
Death_after_BC2 = sum(results_tmp == "Death_after_BC2"))
Upvotes: 0
Reputation: 6552
you could transform your data.frame in order to help you. One way to go with dplyr
after reshaping with tidyr
is underneath.
There surely more solutions
library(dplyr)
df <- readr::read_delim("PERSON_ID EVENT
10000000002 BC1
10000000002 R_B
10000000002 BC2
10000000002 DEATH
10000000002 EPI
10000000007 BC1
10000000007 BC2
10000000007 DEATH
10000000010 DEATH
10000000827 DEATH
10000000830 BC1
10000000830 DEATH", delim = " ")
# transform your data to create a new categorical column with what you want
new_df <- df %>%
mutate(value = T) %>%
tidyr::spread(EVENT, value, fill = F) %>%
group_by(PERSON_ID) %>%
mutate(cat = if_else(BC1 && BC2, "after BC2", if_else(BC1, "after BC1", "before BC1"))) %>%
ungroup()
new_df
#> # A tibble: 5 × 7
#> PERSON_ID BC1 BC2 DEATH EPI R_B cat
#> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl> <chr>
#> 1 1e+10 TRUE TRUE TRUE TRUE TRUE after BC2
#> 2 1e+10 TRUE TRUE TRUE FALSE FALSE after BC2
#> 3 1e+10 FALSE FALSE TRUE FALSE FALSE before BC1
#> 4 1e+10 FALSE FALSE TRUE FALSE FALSE before BC1
#> 5 1e+10 TRUE FALSE TRUE FALSE FALSE after BC1
# count the variable
new_df %>% count(cat)
#> # A tibble: 3 × 2
#> cat n
#> <chr> <int>
#> 1 after BC1 1
#> 2 after BC2 2
#> 3 before BC1 2
Upvotes: 0