Reputation: 141
I have a quite basic R question.
The column names of my data frame all have an unique pattern:
colnames <- c("MSCI 'COUNTRY NAME 1' - PRICE INDEX","MSCI 'COUNTRY NAME 2' - PRICE INDEX",
"MSCI 'COUNTRY NAME 3' - PRICE INDEX","MSCI 'COUNTRY NAME 4' - PRICE INDEX")
Example for one Country: MSCI CANADA - PRICE INDEX.
I want to change all column names to just the country name (in this case 'Canada'). Is there a quick way to remove the 'MSCI', the 'Price Index' and the capital letters?
Thanks!
Upvotes: 0
Views: 100
Reputation: 887148
An option with str_extract
library(stringr)
str_extract(v1, "(?<=')[^']+")
#[1] "COUNTRY NAME 1" "COUNTRY NAME 2" "COUNTRY NAME 3" "COUNTRY NAME 4"
v1 <- c("MSCI 'COUNTRY NAME 1' - PRICE INDEX", "MSCI 'COUNTRY NAME 2' - PRICE INDEX",
"MSCI 'COUNTRY NAME 3' - PRICE INDEX", "MSCI 'COUNTRY NAME 4' - PRICE INDEX"
)
Upvotes: 0
Reputation: 521289
Use sub
for a base R option:
colnames <- sub("^MSCI '(.*?)'.*$", "\\1", colnames)
colnames
[1] "COUNTRY NAME 1" "COUNTRY NAME 2" "COUNTRY NAME 3" "COUNTRY NAME 4"
Data:
colnames <- c("MSCI 'COUNTRY NAME 1' - PRICE INDEX",
"MSCI 'COUNTRY NAME 2' - PRICE INDEX",
"MSCI 'COUNTRY NAME 3' - PRICE INDEX",
"MSCI 'COUNTRY NAME 4' - PRICE INDEX")
If the country names really don't have single quotes around them, then use this version:
name <- "MSCI CANADA - PRICE INDEX"
country <- sub("^MSCI (.*?) - PRICE INDEX$", "\\1", name)
Upvotes: 1