Nick Knauer
Nick Knauer

Reputation: 4243

Remove String in Column Name if condition

I have a dataframe that looks like this:

df

Col+djek    Col_test+deg    Col_+dege     Col_+test
       1               1            1             1

In the column name, how do I remove anything after the '+' symbol if the column name does not contain the string 'test'?

This was my attempt but it gave me an error:

colnames(df) = if(!grepl(df, "test")){ gsub("+.*","",colnames(df))}

Final output should be this:

     Col    Col_test+deg         Col_     Col_+test
       1               1            1             1

Upvotes: 3

Views: 116

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626802

You may use

gsub("^(?!.*test)([^+]*)\\+.*","\\1", colnames(df), perl=TRUE)

See the regex demo.

Details

  • ^ - start of string
  • (?!.*test) - a negative lookahead (supported in PCRE patterns via perl=TRUE) that fails the match if after any 0+ chars other than line break chars there is a test substring
  • ([^+]*) - Capturing group #1: 0 or more chars other than +
  • \\+ - a + sign
  • .* - the rest of the line to the end.

The \1 in the replacment argument restores the Group 1 value in the resulting string.

An R testing snippet:

> names <- c("Col+djek", "Col_test+deg", "Col_+dege", "Col_+test")
> gsub("^(?!.*test)([^+]*)\\+.*","\\1", names, perl=TRUE)
[1] "Col"          "Col_test+deg" "Col_"         "Col_+test"

Upvotes: 1

Related Questions