findinginfinity
findinginfinity

Reputation: 3

Replacing column headers with matched variables from another dataframe

I have a data frame with around 50 entries and 2 columns - long list of short question names and corresponding long question names:

variable_short variable_long
cat Do you have a cat?
col Is blue your favourite colour?
stress Are you stressed at work?

This is the summary of questions used in a survey. The survey responses are in a separate dataframe containing the column names which are the variables in variable_short -

person.id cat col stress
2567 1 0 1

I am trying to create a new dataframe with only the 10 questions I need from 50 and the corresponding survey responses along with the person id. What I am struggling with replacing the short question format in the survey data with the corresponding long question format from the variables data frame. Ideally something like this:

person.id Do you have a cat? Are you stressed at work?
2567 1 1

I started to subset to get the variables to the ones I need:

new <- subset(variables, variable_short %in% c('cat', 'stress'), select = c('variable_short', 'variable_long'))

It reorders the variables so I can't go by the order and I assumed this isn't the most efficient way to do it incase variable order changes. I can rename column headings but I am not familiar yet with matching especially matching corresponding variable to a header.

I am completely stuck right now so any help will be appreciated. Thank you so much.

Upvotes: 0

Views: 814

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

We can rename the column names using match :

names(df2)[-1] <- df1$variable_long[match(names(df2[-1]), df1$variable_short)]
df2

#  person.id Do you have a cat? Is blue your favourite colour? Are you stressed at work?
#1      2567                  1                              0                    1

If you want to keep only the values which has 1 in it, reshape the data in long format and filter.

library(dplyr)
library(tidyr)

df2 %>%
  pivot_longer(cols = -person.id) %>%
  filter(value == 1)

#  person.id name                      value
#      <int> <chr>                     <int>
#1      2567 Do you have a cat?            1
#2      2567 Are you stressed at work?     1

Upvotes: 1

Sarah Gillespie
Sarah Gillespie

Reputation: 21

You problem may be that R does not allow column headers with spaces to be referenced later in the code, so using the long question as a column title would not be a successful strategy. To avoid this in my own coding, I replace each space with an underscore for easy readability, but I've seen other people use camelCase or PascalCase naming styles. Since you only have ten questions / column titles, this may be an okay situation to hard code the new column names rather than rename automatically, as detailed in this Stack Overflow question thread.

Upvotes: 0

Related Questions