Reputation: 794
I am trying to create a function that will generate a new variable off conditional values. I have a survey dataset with 100+ columns that will be collapsed accordingly. Read this but it did not help.
'data.frame': 117 obs. of 7 variables:
$ fin_partner: Factor w/ 4 levels "","9","No","Yes": 2 2 4 3 2 2 2 2 4 4 ...
$ fin_parent : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 4 3 2 2 ...
$ fin_kids : Factor w/ 4 levels "","9","No","Yes": 4 2 2 2 2 2 2 2 2 2 ...
$ fin_othkids: Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 3 2 2 2 ...
$ fin_fam : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 4 3 2 2 ...
$ fin_friend : Factor w/ 4 levels "","9","No","Yes": 2 2 3 3 2 2 2 2 4 2 ...
$ fin_oth : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 2 2 4 2 ...
I would like to be able to subset the dataset according to columns, and then pass that through the function. Right now, the values contain "Yes", "No", "999" (for missing).
My goal is to be able to say if, for each row, any column contains "Yes", then the new column will populate "Yes". I am sure there is an easier way than the code below, so I am open to that.
My code currently:
trial <- df[, 23:29]
trial.test <- as.data.frame(trial)
composite_score <- function(x){
# Convert to numeric values
change_to_number <- function(j) {
for (i in 1:length(j)){
if(i == "Yes"){
i <- 1
}
else{
i <- 0
}
}
}
x <- change_to_number(x)
new_col_var <- function(k){
if(rowSums(k) > 0){
k$newvar <- 1
}
else {
k$newvar <- 0
}
}
x <- new_col_var(x)
}
composite_score(trial.test)
Code produces the following error:
Error in rowSums(k) : 'x' must be an array of at least two dimensions
Data:
> dput(head(trial.test))
structure(list(fin_partner = structure(c(2L, 2L, 4L, 3L, 2L,
2L), .Label = c("", "9", "No", "Yes"), class = "factor"), fin_parent = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"),
fin_kids = structure(c(4L, 2L, 2L, 2L, 2L, 2L), .Label = c("",
"9", "No", "Yes"), class = "factor"), fin_othkids = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"),
fin_fam = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("",
"9", "No", "Yes"), class = "factor"), fin_friend = structure(c(2L,
2L, 3L, 3L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"),
fin_oth = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("",
"9", "No", "Yes"), class = "factor")), .Names = c("fin_partner",
"fin_parent", "fin_kids", "fin_othkids", "fin_fam", "fin_friend",
"fin_oth"), row.names = c(NA, 6L), class = "data.frame")
Upvotes: 1
Views: 2181
Reputation: 4024
library(tidyr)
library(dplyr)
library(magrittr)
trial.test %<>% mutate(row_number = 1:n())
answer =
trial.test %>%
gather(variable, value, -row_number) %>%
filter(value == "Yes") %>%
select(-variable) %>%
distinct %>%
right_join(trial.test)
Upvotes: 0
Reputation: 6921
Thanks for posting the data, it makes it possible to actually check what I write!
# Loading your data
trial.test <- structure(list(fin_partner = [... redacted ...], class = "data.frame")
# computing the new variable
# the MARGIN=1 arg precises that we are working on the rows
# the applied function just looks for a "Yes" in the row
# and returns "Yes" if... yes, "No" otherwise.
myvar <- apply(trial.test, MARGIN=1, FUN=function(row)
ifelse(any("Yes" %in% row), "Yes", "No"))
# converting it to factor
myvar <- factor(myvar)
# putting it in trial.test just for illustration
cbind(trial.test, summary=myvar)
This gives:
fin_partner fin_parent fin_kids fin_othkids fin_fam fin_friend fin_oth summary
1 9 9 Yes 9 9 9 9 Yes
2 9 9 9 9 9 9 9 No
3 Yes 9 9 9 9 No 9 Yes
4 No 9 9 9 9 No 9 No
5 9 9 9 9 9 9 9 No
6 9 9 9 9 9 9 9 No
Upvotes: 1
Reputation: 24945
Your change_to_number
function is badly broken - it changes only the i
to 1 or 0, which doesn't have any result on the input. You could change it to:
change_to_number <- function(j){
sapply(j, function(x) +(x=="yes"))
}
Or, change the overall function to:
composite_score <- function(x){
+(apply(x, 1, function(z) ("yes" %in% z)))
}
Then run your function:
dat$newcol <- composite_score(dat)
Explanation: You want to know if there are any "yes"
in each row. To see if there are, you could run the below command for each row:
"yes" %in% trial.test[1, ]
"yes" %in% trial.test[2, ]....
to do that, you can use apply as below - we are applying the function "yes" in z, across rows (the 1), and each row is passed as z into the function:
tempdata <- apply(trial.test, 1, function(z) ("yes" %in% z))
tempdata
You should get a TRUE
or FALSE
for each row. Now we can do a trick, where R will convert TRUE
to 1, and FALSE
to 0:
as.numeric(tempdata)
+(tempdata) #same, less typing
If we put it all together, you get your new column:
+(apply(trial.test, 1, function(z) ("yes" %in% z)))
Upvotes: 1