Reputation: 43
I have a large dataset that consists of multiple columns each with a state name. Each row consists of an individual and which state they live in, designated by a yes in the column of the appropriate state.
Name <- c("John", "Jane", "Joe", "Jim", "Jeane", "Jeff", "Jack")
Q1State1 <- c("no", "yes", "yes", "no", "no", "no", "no")
Q1State2 <- c("yes", "no", "no", "no", "no", "no", "yes")
Q1State3 <- c("no", "no", "no", "yes", "yes", "yes", "no")
Q2State1 <- c("no", "yes", "yes", "no", "no", "no", "no")
Q2State2 <- c("yes", "no", "no", "no", "no", "no", "yes")
Q2State3 <- c("no", "no", "no", "yes", "yes", "yes", "no")
DF <- data.frame(Name, Q1State1, Q1State2, Q1State3, Q2State1, Q2State2, Q2State3)
Name Q1State1 Q1State2 Q1State3 Q2State1 Q2State2 Q2State3
1 John no yes no no yes no
2 Jane yes no no yes no no
3 Joe yes no no yes no no
4 Jim no no yes no no yes
5 Jeane no no yes no no yes
6 Jeff no no yes no no yes
7 Jack no yes no no yes no
I would like to end up with one column for State instead of multiple columns. The end result would look like:
name Q1State Q2State
1 John State2 State2
2 Jane State1 State1
3 Joe State1 State1
4 Jim State3 State3
5 Jeane State3 State3
6 Jeff State3 State3
7 Jack State2 State2
I can use
unite(DF, State1, State2, State3)
to accomplish the second part of my objective without difficulty. My issue is with the intermediary step needed. I do not know how to accomplish filling cells with the appropriate state name or a blank. I would like it to look like:
name Q1State1 Q1State2 Q1State3 Q2State1 Q2State2 Q2State3
1 John State2 State2
2 Jane State1 State1
3 Joe State1 State1
4 Jim State3 State3
5 Jeane State3 State3
6 Jeff State3 State3
7 Jack State2 State2
A similar question Replace values in a column with specific row value from same column using loop posted before but that one used the first row of data to fill the cells. I have tried using similar coding in dplyr but I cannot figure out how to correctly call the column names.
DF %>%
mutate_at(vars(starts_with('State')), ~ case_when(. == 'yes' ~colnames(.), TRUE ~ ''))
With this code I get an error. I am not sure how to designate that the column heading be used to fill cells. I said have tried using mutate in dplyr but can't figure out how to properly call for the column heading.
Upvotes: 1
Views: 82
Reputation: 4169
Using data.table
DF2 <- dcast(melt(DF, id.vars="Name")[value == "yes"][, c("Q", "State") := tstrsplit(variable, "State")][, -c("value", "variable")], ... ~ Q)
Giving
Name Q1 Q2
1: Jack 2 2
2: Jane 1 1
3: Jeane 3 3
4: Jeff 3 3
5: Jim 3 3
6: Joe 1 1
7: John 2 2
Upvotes: 0
Reputation: 40181
One possibility could be:
DF %>%
transmute(Name,
State = names(.)[max.col(. == "yes")])
Name State
1 John State2
2 Jane State1
3 Joe State1
4 Jim State3
5 Jeane State3
6 Jeff State3
7 Jack State2
An option for the updated question, with the addition of tidyr
:
DF %>%
pivot_longer(-Name) %>%
extract(name, into = c("name1", "name2"), "(Q*\\d+)([[:alnum:]]+)") %>%
filter(value == "yes") %>%
select(-value) %>%
mutate(name1 = paste0(name1, "State")) %>%
pivot_wider(names_from = "name1", values_from = "name2")
Name Q1State Q2State
<chr> <chr> <chr>
1 John State2 State2
2 Jane State1 State1
3 Joe State1 State1
4 Jim State3 State3
5 Jeane State3 State3
6 Jeff State3 State3
7 Jack State2 State2
Upvotes: 1
Reputation: 887971
An option with data.table
library(data.table)
melt(setDT(DF), id.var = 'Name', variable.name = 'State')[
value == 'yes'][, value := NULL][]
# Name State
#1: Jane State1
#2: Joe State1
#3: John State2
#4: Jack State2
#5: Jim State3
#6: Jeane State3
#7: Jeff State3
Name <- c("John", "Jane", "Joe", "Jim", "Jeane", "Jeff", "Jack")
State1 <- c("no", "yes", "yes", "no", "no", "no", "no")
State2 <- c("yes", "no", "no", "no", "no", "no", "yes")
State3 <- c("no", "no", "no", "yes", "yes", "yes", "no")
DF <- data.frame(Name, State1, State2, State3)
Upvotes: 0
Reputation: 5456
You could transform to long format and filter:
Name <- c("John", "Jane", "Joe", "Jim", "Jeane", "Jeff", "Jack")
State1 <- c("no", "yes", "yes", "no", "no", "no", "no")
State2 <- c("yes", "no", "no", "no", "no", "no", "yes")
State3 <- c("no", "no", "no", "yes", "yes", "yes", "no")
DF <- data.frame(Name, State1, State2, State3)
DF %>%
pivot_longer(-Name, names_to = "State", values_to = "value") %>%
filter(value == "yes") #%>%
# select(-value)
# # A tibble: 7 x 3
# Name State value
# <fct> <chr> <fct>
# 1 John State2 yes
# 2 Jane State1 yes
# 3 Joe State1 yes
# 4 Jim State3 yes
# 5 Jeane State3 yes
# 6 Jeff State3 yes
# 7 Jack State2 yes
Upvotes: 0