JGG
JGG

Reputation: 71

Identifying the column that has the lowest value

I have a data frame given by the following

DF <- structure(list(ID = c(1, 129, 169, 1087), `Collab Years Patents` = c(NA, 
"2011, 2011, 2011", "2010", "2006, 2006"), `Collab Years Publications` = c("2011", 
"2015, 2016, 2016", "2010", NA), ECP = c("2011", "2011", "2010", 
"2006")), .Names = c("ID", "Collab Years Patents", "Collab Years Publications", 
"ECP"), row.names = c(1L, 107L, 136L, 859L), class = "data.frame")

The column ECP is the minimum year of the two collaboration columns (which could contain several years). I need an output that says which column the ECP belongs to. For example, a solution to above could be another column vector to above frame with the elements:

    structure(list(ID = c(1, 129, 169, 1087), `Collab Years Patents` = c(NA, 
"2011, 2011, 2011", "2010", "2006, 2006"), `Collab Years Publications` = c("2011", 
"2015, 2016, 2016", "2010", NA), ECP = c("2011", "2011", "2010", 
"2006"), identifier = c("Publications", "Patents", "Both", "Patents"
)), .Names = c("ID", "Collab Years Patents", "Collab Years Publications", 
"ECP", "identifier"), row.names = c(1L, 107L, 136L, 859L), class = "data.frame")

Upvotes: 1

Views: 43

Answers (2)

moodymudskipper
moodymudskipper

Reputation: 47300

Using tidyverse (dplyr and purrr):

library(tidyverse)

DF %>%
  mutate_at(2:3,strsplit,", ") %>%
  transmute(identifier = pmap(.[2:4],~c("Publications","Patents","Both")[
    2*(..3 %in% .x) + (..3 %in% .y)])) %>%
  bind_cols(DF,.)

#     ID Collab Years Patents Collab Years Publications  ECP   identifier
# 1    1                 <NA>                      2011 2011 Publications
# 2  129     2011, 2011, 2011          2015, 2016, 2016 2011      Patents
# 3  169                 2010                      2010 2010         Both
# 4 1087           2006, 2006                      <NA> 2006      Patents

Upvotes: 0

akrun
akrun

Reputation: 886938

Here is an option using str_detect. Loop through the collaboration columns (sapply(DF[2:3],), use str_detect to check which one of the column have the value of 'ECP'. multiply by col to convert the TRUE values to the column index, replace the NA elements with 0, get the column names correspond based on the maximum column index, remove the prefix part of the column names with sub, and assign those elements in 'm1' that are greater than 0 i.e. have 'ECP' in both to 'Both' on the created vector 'v1'

library(stringr)
m1 <- col(DF[2:3]) *sapply(DF[2:3], function(x) str_detect(x, DF$ECP))
m1[is.na(m1)] <- 0
v1 <- sub(".*\\s(\\w+)$", "\\1", names(DF)[2:3][max.col(m1)])
v1[rowSums(m1 > 0) ==2] <- "Both"
DF$identifier <- v1
DF$identifier
#[1] "Publications" "Patents"      "Both"         "Patents"   

Upvotes: 3

Related Questions