Reputation: 11
I have exported old files from a legacy system. There isn't any documentation and I have to search for the data in a dataframe/matrix that as more than 300 columns.
for example, using the following representative data
a <- c("jan", "mar", "jan", "feb", "feb")
b <- c("feb", "mar", "mar", "january", "mar")
c <- c("jan", "feb", "feb", "jan", "jan")
d <- c("jan", "mar", "jan", "february", "feb")
e <- c("feb", "jan", "feb", "march", "mar")
f <- c("january", "february", "feb", "jan", "janet")
xxx <- data.frame(a,b,c,d,e,f)
xxx
I need to be able to search for "Jan" and all data elements including "Jan", "January", "Janet" should show up.
Tried using
which(xxx =="Jan", arr.ind=TRUE)
but it will only give me a exact match.
Is there a way to wild card the above or another way to implement a search function on a big set of data which I am trying to make sense of.
Upvotes: 1
Views: 445
Reputation: 6222
which(sapply(xxx, function(x) grepl(pattern = "jan", x = x)), arr.ind=TRUE)
# row col
# [1,] 1 1
# [2,] 3 1
# [3,] 4 2
# [4,] 1 3
# [5,] 4 3
# [6,] 5 3
# [7,] 1 4
# [8,] 3 4
# [9,] 2 5
#[10,] 1 6
#[11,] 4 6
#[12,] 5 6
Upvotes: 1
Reputation: 39174
Not sure your desired output, but the following code returns a list with matching word from each column.
lapply(xxx, function(col) grep(pattern = "jan", x = col, value = TRUE))
# $a
# [1] "jan" "jan"
#
# $b
# [1] "january"
#
# $c
# [1] "jan" "jan" "jan"
#
# $d
# [1] "jan" "jan"
#
# $e
# [1] "jan"
#
# $f
# [1] "january" "jan" "janet"
Without value = TRUE
, the same code returns the index of the matching word.
lapply(xxx, function(col) grep(pattern = "jan", x = col))
# $a
# [1] 1 3
#
# $b
# [1] 4
#
# $c
# [1] 1 4 5
#
# $d
# [1] 1 3
#
# $e
# [1] 2
#
# $f
# [1] 1 4 5
If you replace grep
with grepl
, the code would return a list of logical vector showing if words matched.
lapply(xxx, function(col) grepl(pattern = "jan", x = col))
# $a
# [1] TRUE FALSE TRUE FALSE FALSE
#
# $b
# [1] FALSE FALSE FALSE TRUE FALSE
#
# $c
# [1] TRUE FALSE FALSE TRUE TRUE
#
# $d
# [1] TRUE FALSE TRUE FALSE FALSE
#
# $e
# [1] FALSE TRUE FALSE FALSE FALSE
#
# $f
# [1] TRUE FALSE FALSE TRUE TRUE
Upvotes: 2