Reputation: 6874
Aim
I have a list of phrases. I also have a dataframe with one column containing text. I want to create a new column in the dataframe containing a (random number) of a sample of the list of phrases as long as the phrase is not present in the dataframe column
The input dataframe:
structure(list(report = c("Biopsies of small bowel mucosa including Brunner's glands",
"These are fragments of small bowel mucosa which include Brunner's glands ",
"These are fragments of small bowel mucosa which include Brunner's glands There is no evidence of coeliac disease in these biopsies",
"There is coeliac disease here. ",
"Biopsies of specialisd gastric mucosa with moderate acute and active inflammation.",
"These are fragments of small bowel mucosa. The small bowel fragments are within normal limits"
)), .Names = "report", row.names = c(NA, 6L), class = "data.frame")
The input list:
c("active inflammation", "coeliac disease","Brunner's glands")
My intended output:
Phrase List sample
Biopsies of small bowel mucosa including Brunner's glands active inflammation
These are fragments of small bowel mucosa which include Brunner's glands active inflammation,coeliac disease
These are fragments of small bowel mucosa which include Brunner's glands There is no evidence of coeliac disease in these biopsies active inflammation
There is coeliac disease here. Brunner's glands
Biopsies of specialisd gastric mucosa with moderate acute and active inflammation coeliac disease,Brunner's glands
These are fragments of small bowel mucosa. The small bowel fragments are within normal limits active inflammation
I have tried
Final$mine<-ifelse(grepl(paste(ListCheck, collapse='|'), Final[,1], ignore.case=TRUE),print("Check here"),sample(ListCheck,replace=T))
but this just checks whether any of the words in the list are present and if not picks a random word from the list.
Upvotes: 2
Views: 47
Reputation: 5893
You could first check which inputs are not present, i.e. (calling your data df
)
input_list <- c("active inflammation", "coeliac disease","Brunner's glands")
lst <- input_list[sapply(input_list, function(x) any(grepl(x, df$report)))]
Then to have a random number, use another sample for selecting the count per row
df$new <- sapply(1:nrow(df), function(x) {
paste0(sample(lst, sample(1:length(lst), 1), replace = TRUE), collapse = ", ")
})
Upvotes: 1