Referring to columns by index not by name

Question

Background

I have a survey table as follows

E313     B515       C515      ...   (more columns)
1122     John doe   I don't like the lesson
2211     Mary Jane  It was excellent

The survey provider also provided labels for decoding the columns in the survey as follows (survey_data_map.csv):

Code    Label
E313    Unique Identifier
B515    Full name
C515    Feedback
.
.
.
(more rows)

So I have written a little snippet that decodes columns in the survey to the column labels.

survey_data_map <- read.csv("survey_data_map.csv")
for(i in 1:length(names(survey))) {
  
  label <- survey_data_map$Label[survey_data_map$Code == names(survey)[i]]
  if (length(label) > 0) {
    names(survey)[i] <- label  
  }
}

Question

The column names in the survey_data_map.csv that decodes column name may change. My question is how do I re-write the for-loop to use column indexes instead of using column names Code and Label?

Thank you.

Konrad Rudolph · Accepted Answer

In general, columns of a data frame can be addressed with the [[ subset operator. You can use either the numeric index or the name (as a character string) to do so:

survey_data_map[[1L]] # same as
survey_data_map[['Code']]

However, be sure that this is what you should actually do! You wrote:

What if the provider changes the column names of the survey_data_map.csv

And that’s indeed a valid concern! However, at least if this happens it’s likely that you’ll get an error. Conversely, another thing that also happens frequently is that somebody reorders the columns of a table. If this happens and you use column indices your code will continue to run but it will produce wrong results.

Referring to columns by index not by name

Background

Question

Answers (1)

Related Questions