Reputation: 4243
I have a dataframe that looks like this:
df
Col+djek Col_test+deg Col_+dege Col_+test
1 1 1 1
In the column name, how do I remove anything after the '+' symbol if the column name does not contain the string 'test'?
This was my attempt but it gave me an error:
colnames(df) = if(!grepl(df, "test")){ gsub("+.*","",colnames(df))}
Final output should be this:
Col Col_test+deg Col_ Col_+test
1 1 1 1
Upvotes: 3
Views: 116
Reputation: 626802
You may use
gsub("^(?!.*test)([^+]*)\\+.*","\\1", colnames(df), perl=TRUE)
See the regex demo.
Details
^
- start of string(?!.*test)
- a negative lookahead (supported in PCRE patterns via perl=TRUE
) that fails the match if after any 0+ chars other than line break chars there is a test
substring([^+]*)
- Capturing group #1: 0 or more chars other than +
\\+
- a +
sign.*
- the rest of the line to the end.The \1
in the replacment argument restores the Group 1 value in the resulting string.
An R testing snippet:
> names <- c("Col+djek", "Col_test+deg", "Col_+dege", "Col_+test")
> gsub("^(?!.*test)([^+]*)\\+.*","\\1", names, perl=TRUE)
[1] "Col" "Col_test+deg" "Col_" "Col_+test"
Upvotes: 1