Reputation: 95
I have a data frame (hit) that contains a single column. This is populated with unique search results.
A second data frame (data) contains the results of the various search queries. The column names identify the search term used and the rows are populated with the search results.
I want to build a matrix or another data frame that is populated according to whether the search result is present or not by column.
I can do this using base R with the following code:
library(tidyverse)
hit <- read_csv("hit
A1
A3
B2
B4
D3")
data <- read_csv("Search1, Search2, Search3, Search4
A1, B4, A3, A1
B4, D3, NA, B2
D3, NA, NA, B4")
search <- c("Search1", "Search2", "Search3", "Search4")
the_matrix <- matrix(data = NA, nrow = 5, ncol = 4)
rownames(the_matrix) <- hit$hit
colnames(the_matrix) <- search
for (i in search)
for (j in 1:3){
result <- data[[i]][[j]]
row_index <- which(rownames(the_matrix) == result)
the_matrix[row_index, i] <- 1
}
the_matrix[is.na(the_matrix)] <- 0
In my mind, there should be a way of achieving this same result with the tidyverse, using the first data frame as the starting point. From there, the second data frame is introduced column by column using the search results as the key to populate.
Can anyone help?
Upvotes: 0
Views: 801
Reputation: 26343
You can use map_df
in combination with match
and then replace all non-0
s in a_tibble
with 1L
.
library(purrr)
library(tidyr)
a_tibble <- map_df(data, ~match(hit[["hit"]], ., nomatch = 0L))
a_tibble[a_tibble != 0] <- 1L
a_tibble %>%
add_column(., hit = hit$hit, .before = 1)
# A tibble: 5 x 4
# hit Search1 Search2 Search3 Search4
# <chr> <int> <int> <int> <int>
#1 A1 1 0 0 1
#2 A3 0 0 1 0
#3 B2 0 0 0 1
#4 B4 1 1 0 1
#5 D3 1 1 0 0
Upvotes: 1
Reputation: 1939
For your information you can also have a rather elegant base r solution
the_matrix=sapply(data,function(x) as.numeric(hit$hit%in%x))
rownames(the_matrix)<-hit$hit
Upvotes: 1
Reputation: 9705
data %>% gather(na.rm=T) %>% mutate(p=1L) %>% spread("key", "p", fill=0L)
# A tibble: 5 x 5
value Search1 Search2 Search3 Search4
<chr> <int> <int> <int> <int>
1 A1 1 0 0 1
2 A3 0 0 1 0
3 B2 0 0 0 1
4 B4 1 1 0 1
5 D3 1 1 0 0
Upvotes: 2