Reputation: 127
I have a dataframe;
vessel<-c(letters[1:4])
type<-c("Fishery Vessel","NA","NA","Cargo")
class<-c("NA","FISHING","NA","CARGO")
status<-c("NA", "NA", "Engaged in Fishing", "Underway")
df<-data.frame(vessel,type, class, status)
vessel type class status
1 a Fishery Vessel NA NA
2 b NA FISHING NA
3 c NA NA Engaged in Fishing
4 d Cargo CARGO Underway
I would like to subset the df to contain only those rows relating to fishing (ie rows 1:3) so that means to me doing something like;
df.sub<-subset(grep("FISH", df) | grep("Fish", df))
But this doesn't work. I've been trialing apply
(such as this question) or partial string matching using grep
(like this question) but I can't seem to pull it all together.
Grateful for any help. My data is 10s of columns and up to a million rows, so trying my best to avoid loops if possible but maybe that's the only way? Thanks!
Upvotes: 1
Views: 1290
Reputation: 887881
In base R
, we can use vectorized option with grepl
and Reduce
subset(df, Reduce(`|`, lapply(df[-1], grepl, pattern = 'fish', ignore.case = TRUE)))
# vessel type class status
#1 a Fishery Vessel NA NA
#2 b NA FISHING NA
#3 c NA NA Engaged in Fishing
Upvotes: 0
Reputation: 1364
Another option you can try
library(dplyr)
library(stringr)
df %>%
filter_all(any_vars(str_detect(., regex("fish", ignore_case =TRUE))))
# vessel type class status
# 1 a Fishery Vessel NA NA
# 2 b NA FISHING NA
# 3 c NA NA Engaged in Fishing
Upvotes: 1
Reputation: 39613
If you want to use apply()
you could compute an index based on your string fish
and then subset. The way to compute Index
is obtaining the sum of those values which match with fish
using grepl()
. You can enable ignore.case = T
in order to avoid issues with upper or lower case text. When the index is greater or equal to 1 then any match occurred so you can make the subset. Here the code:
#Data
vessel<-c(letters[1:4])
type<-c("Fishery Vessel","NA","NA","Cargo")
class<-c("NA","FISHING","NA","CARGO")
status<-c("NA", "NA", "Engaged in Fishing", "Underway")
df<-data.frame(vessel,type, class, status,stringsAsFactors = F)
#Subset
#Create an index with apply
df$Index <- apply(df[1:4],1,function(x) sum(grepl('fish',x,ignore.case = T)))
#Filter
df.sub<-subset(df,Index>=1)
Output:
vessel type class status Index
1 a Fishery Vessel NA NA 1
2 b NA FISHING NA 1
3 c NA NA Engaged in Fishing 1
Upvotes: 1