Subset dataframe according to conditional column name

Question

I'm trying to subset a dataframe according to the value of a column which can change name over different versions of the dataframe. The value I want to test for is "----" in a column named either "SIC" or "NAICS".

Version 1:

df
  MSA  SIC EMPFLAG   EMP
1  40 ----         43372
2  40 07--           192
3  40 0700           192

Version 2:

df
  MSA NAICS EMPFLAG   EMP
1  40  ----         78945
2  40  07--           221
3  40  0700           221

The expect result is:

Version 1:

df
  MSA   EMP
1  40 43372

Version 2:

df
  MSA   EMP
1  40 78945

The following code doesn't work:

df <- ifelse("SIC" %in% colnames(df), 
             df[df$SIC=="----", c("MSA", "EMP")], 
             df[df$NAICS=="----", c("MSA", "EMP")])

Rui Barradas · Accepted Answer

The problem with your code is the use of the vectorized ifelse when you don't really need it.

df <- if(any(grepl("SIC", colnames(df)))) {
         df[df$SIC=="----", c("MSA", "EMP")]
      } else {
         df[df$NAICS=="----", c("MSA", "EMP")]
      }
df

Note that you can also use %in%, which is probably simpler.

df <- if(any("SIC" %in% colnames(df))){
         df[df$SIC=="----", c("MSA", "EMP")]
      } else {
         df[df$NAICS=="----", c("MSA", "EMP")]
      }

Finally, after reading the answer by William Ashford, the following one-liner will do exactly what you've asked. Just use the fact that the columns in question are always the second one.

df <- df[df[, 2] == "----",-which(names(df) %in% c('SIC','NAICS','EMPFLAG'))]

The credits for this go to him.

Subset dataframe according to conditional column name

Answers (2)

Related Questions