Mr_J
Mr_J

Reputation: 95

A tidyverse solution to populating a matrix/data frame

I have a data frame (hit) that contains a single column. This is populated with unique search results.

A second data frame (data) contains the results of the various search queries. The column names identify the search term used and the rows are populated with the search results.

I want to build a matrix or another data frame that is populated according to whether the search result is present or not by column.

I can do this using base R with the following code:

    library(tidyverse)
    hit <- read_csv("hit
                     A1 
                     A3
                     B2
                     B4
                     D3")

    data <- read_csv("Search1, Search2, Search3, Search4
             A1, B4, A3, A1
             B4, D3, NA, B2
             D3, NA, NA, B4")



    search <- c("Search1", "Search2", "Search3", "Search4")

    the_matrix <- matrix(data = NA, nrow = 5, ncol = 4)
    rownames(the_matrix) <- hit$hit 
    colnames(the_matrix) <- search

    for (i in search)
        for (j in 1:3){
            result <- data[[i]][[j]]
            row_index <- which(rownames(the_matrix) == result)
            the_matrix[row_index, i] <- 1
        }

    the_matrix[is.na(the_matrix)] <- 0

In my mind, there should be a way of achieving this same result with the tidyverse, using the first data frame as the starting point. From there, the second data frame is introduced column by column using the search results as the key to populate.

Can anyone help?

Upvotes: 0

Views: 801

Answers (3)

markus
markus

Reputation: 26343

You can use map_df in combination with match and then replace all non-0s in a_tibble with 1L.

library(purrr)
library(tidyr)
a_tibble <- map_df(data, ~match(hit[["hit"]], ., nomatch = 0L))
a_tibble[a_tibble != 0] <- 1L
a_tibble %>%
  add_column(., hit = hit$hit, .before = 1)
#  A tibble: 5 x 4
#  hit   Search1 Search2 Search3 Search4
#  <chr>   <int>   <int>   <int>   <int>
#1 A1          1       0       0       1
#2 A3          0       0       1       0
#3 B2          0       0       0       1
#4 B4          1       1       0       1
#5 D3          1       1       0       0

Upvotes: 1

Antonios
Antonios

Reputation: 1939

For your information you can also have a rather elegant base r solution

the_matrix=sapply(data,function(x) as.numeric(hit$hit%in%x))
rownames(the_matrix)<-hit$hit

Upvotes: 1

thc
thc

Reputation: 9705

data %>% gather(na.rm=T) %>% mutate(p=1L) %>% spread("key", "p", fill=0L)

# A tibble: 5 x 5
  value Search1 Search2 Search3 Search4
  <chr>   <int>   <int>   <int>   <int>
1 A1          1       0       0       1
2 A3          0       0       1       0
3 B2          0       0       0       1
4 B4          1       1       0       1
5 D3          1       1       0       0

Upvotes: 2

Related Questions