H. Kraus
H. Kraus

Reputation: 43

Filling row elements with column heading given certain criteria in R

I have a large dataset that consists of multiple columns each with a state name. Each row consists of an individual and which state they live in, designated by a yes in the column of the appropriate state.

Name <- c("John", "Jane", "Joe", "Jim", "Jeane", "Jeff", "Jack")
Q1State1 <- c("no", "yes", "yes", "no", "no", "no", "no")
Q1State2 <- c("yes", "no", "no", "no", "no", "no", "yes")
Q1State3 <- c("no", "no", "no", "yes", "yes", "yes", "no")
Q2State1 <- c("no", "yes", "yes", "no", "no", "no", "no")
Q2State2 <- c("yes", "no", "no", "no", "no", "no", "yes")
Q2State3 <- c("no", "no", "no", "yes", "yes", "yes", "no")
DF <- data.frame(Name, Q1State1, Q1State2, Q1State3, Q2State1, Q2State2, Q2State3)

   Name Q1State1 Q1State2 Q1State3 Q2State1 Q2State2 Q2State3
1  John       no      yes       no       no      yes       no
2  Jane      yes       no       no      yes       no       no
3   Joe      yes       no       no      yes       no       no
4   Jim       no       no      yes       no       no      yes
5 Jeane       no       no      yes       no       no      yes
6  Jeff       no       no      yes       no       no      yes
7  Jack       no      yes       no       no      yes       no

I would like to end up with one column for State instead of multiple columns. The end result would look like:

   name    Q1State   Q2State
1  John     State2    State2
2  Jane     State1    State1
3   Joe     State1    State1
4   Jim     State3    State3
5 Jeane     State3    State3
6  Jeff     State3    State3
7  Jack     State2    State2

I can use unite(DF, State1, State2, State3) to accomplish the second part of my objective without difficulty. My issue is with the intermediary step needed. I do not know how to accomplish filling cells with the appropriate state name or a blank. I would like it to look like:

   name Q1State1 Q1State2 Q1State3  Q2State1  Q2State2  Q2State3
1  John            State2                       State2          
2  Jane State1                        State1                        
3   Joe State1                        State1                    
4   Jim                     State3                        State3
5 Jeane                     State3                        State3
6  Jeff                     State3                        State3 
7  Jack            State2                       State2   

A similar question Replace values in a column with specific row value from same column using loop posted before but that one used the first row of data to fill the cells. I have tried using similar coding in dplyr but I cannot figure out how to correctly call the column names.

DF %>% 
  mutate_at(vars(starts_with('State')), ~ case_when(. == 'yes' ~colnames(.), TRUE ~ ''))

With this code I get an error. I am not sure how to designate that the column heading be used to fill cells. I said have tried using mutate in dplyr but can't figure out how to properly call for the column heading.

Upvotes: 1

Views: 82

Answers (4)

rg255
rg255

Reputation: 4169

Using data.table

DF2 <- dcast(melt(DF, id.vars="Name")[value == "yes"][, c("Q", "State") := tstrsplit(variable, "State")][, -c("value", "variable")], ... ~ Q)

Giving

    Name Q1 Q2
1:  Jack  2  2
2:  Jane  1  1
3: Jeane  3  3
4:  Jeff  3  3
5:   Jim  3  3
6:   Joe  1  1
7:  John  2  2

Upvotes: 0

tmfmnk
tmfmnk

Reputation: 40181

One possibility could be:

DF %>%
 transmute(Name,
           State = names(.)[max.col(. == "yes")])

   Name  State
1  John State2
2  Jane State1
3   Joe State1
4   Jim State3
5 Jeane State3
6  Jeff State3
7  Jack State2

An option for the updated question, with the addition of tidyr:

DF %>%
 pivot_longer(-Name) %>%
 extract(name, into = c("name1", "name2"), "(Q*\\d+)([[:alnum:]]+)") %>%
 filter(value == "yes") %>%
 select(-value) %>%
 mutate(name1 = paste0(name1, "State")) %>%
 pivot_wider(names_from = "name1", values_from = "name2") 

  Name  Q1State Q2State
  <chr> <chr>   <chr>  
1 John  State2  State2 
2 Jane  State1  State1 
3 Joe   State1  State1 
4 Jim   State3  State3 
5 Jeane State3  State3 
6 Jeff  State3  State3 
7 Jack  State2  State2 

Upvotes: 1

akrun
akrun

Reputation: 887971

An option with data.table

library(data.table)
melt(setDT(DF), id.var = 'Name', variable.name = 'State')[
         value == 'yes'][, value := NULL][]
#    Name  State
#1:  Jane State1
#2:   Joe State1
#3:  John State2
#4:  Jack State2
#5:   Jim State3
#6: Jeane State3
#7:  Jeff State3

data

Name <- c("John", "Jane", "Joe", "Jim", "Jeane", "Jeff", "Jack")
State1 <- c("no", "yes", "yes", "no", "no", "no", "no")
State2 <- c("yes", "no", "no", "no", "no", "no", "yes")
State3 <- c("no", "no", "no", "yes", "yes", "yes", "no")
DF <- data.frame(Name, State1, State2, State3)

Upvotes: 0

r.user.05apr
r.user.05apr

Reputation: 5456

You could transform to long format and filter:

Name <- c("John", "Jane", "Joe", "Jim", "Jeane", "Jeff", "Jack")
State1 <- c("no", "yes", "yes", "no", "no", "no", "no")
State2 <- c("yes", "no", "no", "no", "no", "no", "yes")
State3 <- c("no", "no", "no", "yes", "yes", "yes", "no")
DF <- data.frame(Name, State1, State2, State3)

DF %>%
  pivot_longer(-Name, names_to = "State", values_to = "value") %>%
  filter(value == "yes") #%>%
  # select(-value)

# # A tibble: 7 x 3
# Name  State  value
# <fct> <chr>  <fct>
#   1 John  State2 yes  
# 2 Jane  State1 yes  
# 3 Joe   State1 yes  
# 4 Jim   State3 yes  
# 5 Jeane State3 yes  
# 6 Jeff  State3 yes  
# 7 Jack  State2 yes  

Upvotes: 0

Related Questions