ikel
ikel

Reputation: 546

Referring to columns by index not by name

Background

I have a survey table as follows

E313     B515       C515      ...   (more columns)
1122     John doe   I don't like the lesson
2211     Mary Jane  It was excellent

The survey provider also provided labels for decoding the columns in the survey as follows (survey_data_map.csv):

Code    Label
E313    Unique Identifier
B515    Full name
C515    Feedback
.
.
.
(more rows)

So I have written a little snippet that decodes columns in the survey to the column labels.

survey_data_map <- read.csv("survey_data_map.csv")
for(i in 1:length(names(survey))) {
  
  label <- survey_data_map$Label[survey_data_map$Code == names(survey)[i]]
  if (length(label) > 0) {
    names(survey)[i] <- label  
  }
}

Question

The column names in the survey_data_map.csv that decodes column name may change. My question is how do I re-write the for-loop to use column indexes instead of using column names Code and Label?

Thank you.

Upvotes: 1

Views: 516

Answers (1)

Konrad Rudolph
Konrad Rudolph

Reputation: 546053

In general, columns of a data frame can be addressed with the [[ subset operator. You can use either the numeric index or the name (as a character string) to do so:

survey_data_map[[1L]] # same as
survey_data_map[['Code']]

However, be sure that this is what you should actually do! You wrote:

What if the provider changes the column names of the survey_data_map.csv

And that’s indeed a valid concern! However, at least if this happens it’s likely that you’ll get an error. Conversely, another thing that also happens frequently is that somebody reorders the columns of a table. If this happens and you use column indices your code will continue to run but it will produce wrong results.

Upvotes: 5

Related Questions