Reputation: 51
I have found similar questions but nothing that has solved my problem yet. I have a huge dataframe and I'm trying to find where there are occurrences of a range of strings. Here is my sample data:
var1 <- c("Goats","Sheep","Pigs","Dog","Zebu","Donkeys","Water buffaloes","Dromedary camel","Dog","Pig")
var2 <- c("cats","Dog","birds","rabbits","Plant","fish","guinea pigs","cat","Mouse","dog")
var3 <- c("cats","Dog","birds","rabbits","horses","Sheep","Pigs","Tree","Zebu","Donkeys")
var4 <- c("Plant","Ant","Bee","Tree","Water buffaloes","Donkeys","Water buffaloes","Dromedary camel","Dog","Pig")
df <- data.frame(var1, var2, var3, var4)
head(df)
var1 var2 var3 var4
1 Goats cats cats Plant
2 Sheep Dog Dog Ant
3 Pigs birds birds Bee
4 Dog rabbits rabbits Tree
5 Zebu Plant horses Water buffaloes
6 Donkeys fish Sheep Donkeys
>
I can find the row and column of each occurrence of Tree and plant:
strings = c( "Tree","Plant")
which(apply(df, 1, function(x) any(grepl(paste(strings, collapse = "|"), x))))# ROW
[1] 1 4 5 8
which(apply(df, 2, function(x) any(grepl(paste(strings, collapse = "|"), x)))) # COLUMN
var2 var3 var4
2 3 4
But can't figure out how to put the column name (or indice) in a new variable for each row. This is what I would like:
> head(df)
var1 var2 var3 var4 x1
1 Goats cats cats Plant var4
2 Sheep Dog Dog Ant NA
3 Pigs birds birds Bee NA
4 Dog rabbits rabbits Tree var4
5 Zebu Plant horses Water buffaloes var2
6 Donkeys fish Sheep Donkeys NA
I thought something with str_detect or str_which would would, but not sure how to do it across all columns. Something like:
df <- df %>%
mutate(words = ifelse(any_vars(str_detect(strings))))
Appreciate your time and help.
Upvotes: 0
Views: 469
Reputation: 388982
We can achieve this with rowwise
-
library(dplyr)
cols <- names(df)
df %>%
rowwise() %>%
mutate(x1 = toString(cols[c_across() %in% strings])) %>%
ungroup
# var1 var2 var3 var4 x1
# <chr> <chr> <chr> <chr> <chr>
# 1 Goats cats cats Plant "var4"
# 2 Sheep Dog Dog Ant ""
# 3 Pigs birds birds Bee ""
# 4 Dog rabbits rabbits Tree "var4"
# 5 Zebu Plant horses Water buffaloes "var2"
# 6 Donkeys fish Sheep Donkeys ""
# 7 Water buffaloes guinea pigs Pigs Water buffaloes ""
# 8 Dromedary camel cat Tree Dromedary camel "var3"
# 9 Dog Mouse Zebu Dog ""
#10 Pig dog Donkeys Pig ""
In base R, with apply
-
df$x1 <- apply(df, 1, function(x) toString(cols[x %in% strings]))
Upvotes: 1