syre
syre

Reputation: 982

Subset dataframe according to conditional column name

I'm trying to subset a dataframe according to the value of a column which can change name over different versions of the dataframe. The value I want to test for is "----" in a column named either "SIC" or "NAICS".

Version 1:

df
  MSA  SIC EMPFLAG   EMP
1  40 ----         43372
2  40 07--           192
3  40 0700           192

Version 2:

df
  MSA NAICS EMPFLAG   EMP
1  40  ----         78945
2  40  07--           221
3  40  0700           221

The expect result is:

Version 1:

df
  MSA   EMP
1  40 43372

Version 2:

df
  MSA   EMP
1  40 78945

The following code doesn't work:

df <- ifelse("SIC" %in% colnames(df), 
             df[df$SIC=="----", c("MSA", "EMP")], 
             df[df$NAICS=="----", c("MSA", "EMP")])

Upvotes: 0

Views: 211

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76402

The problem with your code is the use of the vectorized ifelse when you don't really need it.

df <- if(any(grepl("SIC", colnames(df)))) {
         df[df$SIC=="----", c("MSA", "EMP")]
      } else {
         df[df$NAICS=="----", c("MSA", "EMP")]
      }
df

Note that you can also use %in%, which is probably simpler.

df <- if(any("SIC" %in% colnames(df))){
         df[df$SIC=="----", c("MSA", "EMP")]
      } else {
         df[df$NAICS=="----", c("MSA", "EMP")]
      }

Finally, after reading the answer by William Ashford, the following one-liner will do exactly what you've asked. Just use the fact that the columns in question are always the second one.

df <- df[df[, 2] == "----",-which(names(df) %in% c('SIC','NAICS','EMPFLAG'))]

The credits for this go to him.

Upvotes: 1

Will
Will

Reputation: 339

As seen in How to drop columns by name in a data frame

Subset your dataframe such that,

df = df[,-which(names(df) %in% c('SIC','NAICS'))]

This was a very easy answer to find so mights I suggest you take a look through SO before posting questions.

Upvotes: 0

Related Questions