Jebediah15
Jebediah15

Reputation: 794

generate a new factor variable depending on the values of other factors in each row

I am trying to create a function that will generate a new variable off conditional values. I have a survey dataset with 100+ columns that will be collapsed accordingly. Read this but it did not help.

'data.frame':   117 obs. of  7 variables:
 $ fin_partner: Factor w/ 4 levels "","9","No","Yes": 2 2 4 3 2 2 2 2 4 4 ...
 $ fin_parent : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 4 3 2 2 ...
 $ fin_kids   : Factor w/ 4 levels "","9","No","Yes": 4 2 2 2 2 2 2 2 2 2 ...
 $ fin_othkids: Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 3 2 2 2 ...
 $ fin_fam    : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 4 3 2 2 ...
 $ fin_friend : Factor w/ 4 levels "","9","No","Yes": 2 2 3 3 2 2 2 2 4 2 ...
 $ fin_oth    : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 2 2 4 2 ...

I would like to be able to subset the dataset according to columns, and then pass that through the function. Right now, the values contain "Yes", "No", "999" (for missing).

My goal is to be able to say if, for each row, any column contains "Yes", then the new column will populate "Yes". I am sure there is an easier way than the code below, so I am open to that.

My code currently:

trial <- df[, 23:29]
trial.test <- as.data.frame(trial)

composite_score <- function(x){
  # Convert to numeric values
  change_to_number <- function(j) {
    for (i in 1:length(j)){
      if(i == "Yes"){
        i <- 1
      }
      else{
        i <- 0
      }
    }
  }

  x <- change_to_number(x)  

  new_col_var <- function(k){
    if(rowSums(k) > 0){
      k$newvar <- 1
    }
    else {
      k$newvar <- 0
    }
  }

  x <- new_col_var(x)

}

composite_score(trial.test)

Code produces the following error:

Error in rowSums(k) : 'x' must be an array of at least two dimensions 

Data:

> dput(head(trial.test))
structure(list(fin_partner = structure(c(2L, 2L, 4L, 3L, 2L, 
2L), .Label = c("", "9", "No", "Yes"), class = "factor"), fin_parent = structure(c(2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"), 
    fin_kids = structure(c(4L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
    "9", "No", "Yes"), class = "factor"), fin_othkids = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"), 
    fin_fam = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
    "9", "No", "Yes"), class = "factor"), fin_friend = structure(c(2L, 
    2L, 3L, 3L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"), 
    fin_oth = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
    "9", "No", "Yes"), class = "factor")), .Names = c("fin_partner", 
"fin_parent", "fin_kids", "fin_othkids", "fin_fam", "fin_friend", 
"fin_oth"), row.names = c(NA, 6L), class = "data.frame")

Upvotes: 1

Views: 2181

Answers (3)

bramtayl
bramtayl

Reputation: 4024

library(tidyr)
library(dplyr)
library(magrittr)

trial.test %<>% mutate(row_number = 1:n())

answer = 
  trial.test %>%
  gather(variable, value, -row_number) %>%
  filter(value == "Yes") %>%
  select(-variable) %>%
  distinct %>%
  right_join(trial.test)

Upvotes: 0

asachet
asachet

Reputation: 6921

Thanks for posting the data, it makes it possible to actually check what I write!

# Loading your data
trial.test <- structure(list(fin_partner = [... redacted ...], class = "data.frame")

# computing the new variable
# the MARGIN=1 arg precises that we are working on the rows
# the applied function just looks for a "Yes" in the row
# and returns "Yes" if... yes, "No" otherwise.
myvar <- apply(trial.test, MARGIN=1, FUN=function(row) 
    ifelse(any("Yes" %in% row), "Yes", "No"))

# converting it to factor
myvar <- factor(myvar)

# putting it in trial.test just for illustration
cbind(trial.test, summary=myvar)

This gives:

  fin_partner fin_parent fin_kids fin_othkids fin_fam fin_friend fin_oth summary
1           9          9      Yes           9       9          9       9     Yes
2           9          9        9           9       9          9       9      No
3         Yes          9        9           9       9         No       9     Yes
4          No          9        9           9       9         No       9      No
5           9          9        9           9       9          9       9      No
6           9          9        9           9       9          9       9      No

Upvotes: 1

jeremycg
jeremycg

Reputation: 24945

Your change_to_number function is badly broken - it changes only the i to 1 or 0, which doesn't have any result on the input. You could change it to:

change_to_number <- function(j){
        sapply(j, function(x) +(x=="yes"))
}

Or, change the overall function to:

composite_score <- function(x){
    +(apply(x, 1, function(z) ("yes" %in% z)))
}

Then run your function:

dat$newcol <- composite_score(dat)

Explanation: You want to know if there are any "yes" in each row. To see if there are, you could run the below command for each row:

"yes" %in% trial.test[1, ]
"yes" %in% trial.test[2, ]....

to do that, you can use apply as below - we are applying the function "yes" in z, across rows (the 1), and each row is passed as z into the function:

tempdata <- apply(trial.test, 1, function(z) ("yes" %in% z))
tempdata

You should get a TRUE or FALSE for each row. Now we can do a trick, where R will convert TRUE to 1, and FALSE to 0:

as.numeric(tempdata)
+(tempdata) #same, less typing

If we put it all together, you get your new column:

+(apply(trial.test, 1, function(z) ("yes" %in% z)))

Upvotes: 1

Related Questions