Jason Jorgenson
Jason Jorgenson

Reputation: 5

Dynamic R dataframes - change yes/no responses to 1/0

I use an API call to LimeSurvey to get data into a Shiny R app I'm working on. I then manipulate the dataframe so that I have only the responses given by a certain individual over time. The dataframe can look like this:

Appetite <- c("No","Yes","No","No","No","No","No","No","No")
Dental.Health <- c("No","Yes","No","No","No","No","Yes","Yes","No")
Dry.mouth <- c("No","Yes","Yes","Yes","Yes","No","Yes","Yes","No")
Mouth.opening <- c("No","No","Yes","Yes","Yes","No","Yes","Yes","No")
Pain.elsewhere <- c("No","Yes","No","No","No","No","No","No","No")
Sleeping <- c("No","No","No","No","No","Yes","No","No","No")
Sore.mouth <- c("No","No","Yes","Yes","No","No","No","No","No")
Swallowing <- c("No","No","No","No","Yes","No","No","No","No")
Cancer.treatment <- c("No","No","Yes","Yes","No","Yes","No","No","No")
Support.for.my.family <- c("No","No","Yes","Yes","No","No","No","No","No")
Fear.of.cancer.coming.back <- c("No","No","Yes","Yes","No","No","Yes","No","No")
Intimacy  <- c("Yes","No","No","No","No","No","No","No","No")
Dentist   <- c("No","Yes","No","No","No","No","No","No","No")
Dietician <- c("No","No","Yes","Yes","No","No","No","No","No")
Date.submitted <- c("2002-07-25 00:00:00",
                 "2002-09-05 00:00:00",
                 "2003-01-09 00:00:00",
                 "2003-01-09 00:00:00",
                 "2003-07-17 00:00:00",
                 "2003-11-06 00:00:00",
                 "2004-12-17 00:00:00",
                 "2005-06-03 00:00:00",
                 "2005-12-17 00:00:00")

theDataFrame <- data.frame( Date.submitted,
                            Appetite,
                            Dental.Health,
                            Dry.mouth,
                            Mouth.opening,
                            Pain.elsewhere,
                            Sleeping,
                            Sore.mouth,
                            Swallowing,
                            Cancer.treatment,
                            Support.for.my.family,
                            Fear.of.cancer.coming.back,
                            Intimacy,
                            Dentist,
                            Dietician)

To be clear, this dataframe could contain more (or fewer) observations of more (or fewer) variables than the example above.

My goal is to make a dynamic histogram that looks like the following:

library(dplyr)
library(ggplot2)
library(tidyr)

df <- data.frame(timeline = Sys.Date() - 1:10,
                 q3 = sample(c("Yes", "No"), size = 10, replace = T),
                 q4 = sample(c("Yes", "No"), size = 10, replace = T),
                 q5 = sample(c("Yes", "No"), size = 10, replace = T),
                 q6 = sample(c("Yes", "No"), size = 10, replace = T),
                 q7 = sample(c("Yes", "No"), size = 10, replace = T),
                 q8 = sample(c("Yes", "No"), size = 10, replace = T),

                 stringsAsFactors = F) %>%
    mutate(q3 = ifelse(q3 == "Yes", 1, 0),
           q4 = ifelse(q4 == "Yes", 1, 0),
           q5 = ifelse(q5 == "Yes", 1, 0),
           q6 = ifelse(q6 == "Yes", 1, 0),
           q7 = ifelse(q7 == "Yes", 1, 0),
           q8 = ifelse(q8 == "Yes", 1, 0)

    ) %>%
    gather(key = question, value = value, q3, q4, q5, q6, q7, q8)

g <- ggplot(df, aes(x = timeline, y = value, fill = question)) +
    geom_bar(stat = "identity")

g 

I think I will need to use library(lubridate) for the timeline, as the entire dataframe is plain text. I deal with the '.' in the column names like this:

myColNames <- colnames(theDataFrame)

myNames <- myColNames

myNames <- gsub("^X\\.\\.", "", myNames)
myNames <- gsub("\\.", " ", myNames)
names(theDataFrame) <- myNames # items in myChoices get "labels" from myNames

But the most challenging aspect is getting this to work dynamically. The datasets will only contain Date.submitted and (x)number of additional columns that will only be "Yes" or "No"

I hope I've given enough information (this is my first question on Stack Exchange!)

Upvotes: 0

Views: 418

Answers (2)

OmaymaS
OmaymaS

Reputation: 1721

You could also use dplyr::mutate_all and purrr::map

Note: I used stringsAsFactors = F in theDataFrame

theDataFrame <- data.frame( Date.submitted,
                            Appetite,
                            Dental.Health,
                            Dry.mouth,
                            Mouth.opening,
                            Pain.elsewhere,
                            Sleeping,
                            Sore.mouth,
                            Swallowing,
                            Cancer.treatment,
                            Support.for.my.family,
                            Fear.of.cancer.coming.back,
                            Intimacy,
                            Dentist,
                            Dietician, stringsAsFactors = F)

-Create a function to do the conversion you want, for instance:

ConvertYesNo<- function(x){
  if(x=="Yes") y <- as.integer(1)
  else if (x=="No") y <- as.integer(0)
  else y <- x

  return(y)
}

-Use it with mutate_all, which considers all the columns or pick the columns you want using mutate_at. And map the function as follows:

theDataFramex <- theDataFrame %>% 
  mutate_all(funs(map_chr(.,ConvertYesNo)))

> head(theDataFramex,3 )
       Date.submitted Appetite Dental.Health Dry.mouth Mouth.opening Pain.elsewhere Sleeping
1 2002-07-25 00:00:00        0             0         0             0              0        0
2 2002-09-05 00:00:00        1             1         1             0              1        0
3 2003-01-09 00:00:00        0             0         1             1              0        0
  Sore.mouth Swallowing Cancer.treatment Support.for.my.family Fear.of.cancer.coming.back
1          0          0                0                     0                          0
2          0          0                0                     0                          0
3          1          0                1                     1                          1
  Intimacy Dentist Dietician
1        1       0         0
2        0       1         0
3        0       0         1

Upvotes: 0

akrun
akrun

Reputation: 887048

We can update it using base R

theDataFrame[-1] <- +(theDataFrame[-1]=="Yes")

Or with lapply when the dataset is big

theDataFrame[-1] <- lapply(theDataFrame[-1], function(x) as.integer(x=="Yes"))

Upvotes: 1

Related Questions