Reputation: 89
I have a big dataset of about 35000 cases X 32 variables
one of those variables is Description
in which a description of status is given. for example: patient suffered ischemic stroke.
Now I would like to make a dataframe in which I place all cases in which the word "stroke", "STROKE" or "Stroke" is found in the variable Description
.
Could anyone suggest a efficient way to do this. Because now I just added all by hand in a very inefficient way:
df1<-rbind(df[1,],df[2,],df[3,]
It works but it's unbelievably inelegant and prone to mistakes.
Upvotes: 0
Views: 49
Reputation: 649
Here I create some example data to work with.
a <- c(1:10)
b <- c(11:20)
description <- c("Stroke","ALS","Parkinsons","STROKE","STROKE","stroke","Alzheimers","Stroke","ALS","Parkinsons")
df<-data.frame(a,b,description)
df
a b description
1 1 11 Stroke
2 2 12 ALS
3 3 13 Parkinsons
4 4 14 STROKE
5 5 15 STROKE
6 6 16 stroke
7 7 17 Alzheimers
8 8 18 Stroke
9 9 19 ALS
10 10 20 Parkinsons
With this code you can remove every case (row) that is not associated with "Stroke", "STROKE" or "stroke":
df1<-df[!(df$description!="STROKE" & df$description!="Stroke" & df$description!="stroke"),]
df1
a b description
1 1 11 Stroke
4 4 14 STROKE
5 5 15 STROKE
6 6 16 stroke
8 8 18 Stroke
Hope this was what you were looking for.
Upvotes: 1