rightmove27
rightmove27

Reputation: 21

How do I find numbers in a data frame from a list?

I have a list which looks a little like this. Each code uniquely identifies the drug:

drugname <- c('Ibuprofen','Paracetamol','Aspirin')
dose <- c(50, 70, 40)
code <- c(5619, 4820, 6803)
drugtest <- list(drugname, dose, code)

I also have a data frame which includes information on people who are uniquely identified by their idcode. Each row includes information on a different drug that these people use. These drugs are identified by the drugcode which corresponds to the code column in the list.

personcode <- 
matrix(c(1,'female',5619,1,'female',5802,2,'male',4859,3,'male',6803,3,'male',4820, 
3,'male',5428),ncol=3,byrow=TRUE)
colnames(personcode) <- c("idcode","gender","drugcode")
rownames(personcode) <- c("1","2","3","4","5","6")
personcode <- data.frame(personcode)

I want to mutate personcode, adding a column which identifies whether each person (idcode) receives any one of the drugs from the list (code). For example, person 1 and person 3 would be identified as being in receipt of the drug, but not person 2. How do I do this?

Upvotes: 0

Views: 59

Answers (3)

Allan Cameron
Allan Cameron

Reputation: 173803

Here's a neat alternative way of presenting the data:

cbind(personcode, 
      as.data.frame(setNames(lapply(drugtest[[3]],`==`,personcode$drugcode), drugtest[[1]])))

#>   idcode gender drugcode Ibuprofen Paracetamol Aspirin
#> 1      1 female     5619      TRUE       FALSE   FALSE
#> 2      1 female     5802     FALSE       FALSE   FALSE
#> 3      2   male     4859     FALSE       FALSE   FALSE
#> 4      3   male     6803     FALSE       FALSE    TRUE
#> 5      3   male     4820     FALSE        TRUE   FALSE
#> 6      3   male     5428     FALSE       FALSE   FALSE

Upvotes: 0

linog
linog

Reputation: 6226

You can merge your dataframes and check if a medicine appears. For instance with data.table:

library(data.table)
drugtest <- data.table(drugname, dose, code)
setDT(personcode)

personcode2 <- merge(personcode, drugtest, all.x = TRUE, by.x = "drugcode", by.y = "code")
personcode2
  drugcode idcode gender    drugname dose
1     4820      3   male Paracetamol   70
2     4859      2   male        <NA>   NA
3     5428      3   male        <NA>   NA
4     5619      1 female   Ibuprofen   50
5     5802      1 female        <NA>   NA
6     6803      3   male     Aspirin   40

And to get which individuals have received medicines :

personcode2[,.('drug' = sum(!is.na(drugname))>1), by = 'idcode']
   idcode  drug
1:      3  TRUE
2:      2 FALSE
3:      1 FALSE

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

Not sure how you want your expected output to look like but you can use ave :

personcode$any_drug_received <- with(personcode, ave(drugcode %in% drugtest[[3]], 
                                                     idcode, FUN = any))
personcode

#  idcode gender drugcode any_drug_received
#1      1 female     5619              TRUE
#2      1 female     5802              TRUE
#3      2   male     4859             FALSE
#4      3   male     6803              TRUE
#5      3   male     4820              TRUE
#6      3   male     5428              TRUE

The same can be done with dplyr

library(dplyr)
personcode %>%
  group_by(idcode) %>%
  mutate(any_drug_received = any(drugcode %in% drugtest[[3]]))

and data.table

library(data.table)
setDT(personcode)[, any_drug_received := any(drugcode %in% drugtest[[3]]), idcode]

Upvotes: 0

Related Questions