Dirk van Dooren
Dirk van Dooren

Reputation: 89

making a new dataframe by looking for keywords in specific variable

I have a big dataset of about 35000 cases X 32 variables one of those variables is Description in which a description of status is given. for example: patient suffered ischemic stroke.

Now I would like to make a dataframe in which I place all cases in which the word "stroke", "STROKE" or "Stroke" is found in the variable Description.

Could anyone suggest a efficient way to do this. Because now I just added all by hand in a very inefficient way:

df1<-rbind(df[1,],df[2,],df[3,] 

It works but it's unbelievably inelegant and prone to mistakes.

Upvotes: 0

Views: 49

Answers (1)

Olli J
Olli J

Reputation: 649

Here I create some example data to work with.

a <- c(1:10)    
b <- c(11:20)
description  <-  c("Stroke","ALS","Parkinsons","STROKE","STROKE","stroke","Alzheimers","Stroke","ALS","Parkinsons")
df<-data.frame(a,b,description)
df
    a  b description
1   1 11      Stroke
2   2 12         ALS
3   3 13  Parkinsons
4   4 14      STROKE
5   5 15      STROKE
6   6 16      stroke
7   7 17  Alzheimers
8   8 18      Stroke
9   9 19         ALS
10 10 20  Parkinsons

With this code you can remove every case (row) that is not associated with "Stroke", "STROKE" or "stroke":

df1<-df[!(df$description!="STROKE" & df$description!="Stroke" & df$description!="stroke"),]
df1
  a  b description
1 1 11      Stroke
4 4 14      STROKE
5 5 15      STROKE
6 6 16      stroke
8 8 18      Stroke

Hope this was what you were looking for.

Upvotes: 1

Related Questions