Reputation: 3
I have a data frame with around 50 entries and 2 columns - long list of short question names and corresponding long question names:
variable_short | variable_long |
---|---|
cat | Do you have a cat? |
col | Is blue your favourite colour? |
stress | Are you stressed at work? |
This is the summary of questions used in a survey. The survey responses are in a separate dataframe containing the column names which are the variables in variable_short -
person.id | cat | col | stress |
---|---|---|---|
2567 | 1 | 0 | 1 |
I am trying to create a new dataframe with only the 10 questions I need from 50 and the corresponding survey responses along with the person id. What I am struggling with replacing the short question format in the survey data with the corresponding long question format from the variables data frame. Ideally something like this:
person.id | Do you have a cat? | Are you stressed at work? |
---|---|---|
2567 | 1 | 1 |
I started to subset to get the variables to the ones I need:
new <- subset(variables, variable_short %in% c('cat', 'stress'), select = c('variable_short', 'variable_long'))
It reorders the variables so I can't go by the order and I assumed this isn't the most efficient way to do it incase variable order changes. I can rename column headings but I am not familiar yet with matching especially matching corresponding variable to a header.
I am completely stuck right now so any help will be appreciated. Thank you so much.
Upvotes: 0
Views: 814
Reputation: 388982
We can rename the column names using match
:
names(df2)[-1] <- df1$variable_long[match(names(df2[-1]), df1$variable_short)]
df2
# person.id Do you have a cat? Is blue your favourite colour? Are you stressed at work?
#1 2567 1 0 1
If you want to keep only the values which has 1 in it, reshape the data in long format and filter
.
library(dplyr)
library(tidyr)
df2 %>%
pivot_longer(cols = -person.id) %>%
filter(value == 1)
# person.id name value
# <int> <chr> <int>
#1 2567 Do you have a cat? 1
#2 2567 Are you stressed at work? 1
Upvotes: 1
Reputation: 21
You problem may be that R does not allow column headers with spaces to be referenced later in the code, so using the long question as a column title would not be a successful strategy. To avoid this in my own coding, I replace each space with an underscore for easy readability, but I've seen other people use camelCase or PascalCase naming styles. Since you only have ten questions / column titles, this may be an okay situation to hard code the new column names rather than rename automatically, as detailed in this Stack Overflow question thread.
Upvotes: 0