Reputation: 1132
I am trying to delete the first word and the last word in column CCGName
, only with tidyverse, in R. The CCG column contains the word "NHS" with the city name followed by "CCG". I want to get rid of the words "NHS" and "CCG". Is there a way to do this only with tidyverse?
This is my sample of data:
structure(list(SiteType = c(111, 111, 111, 111, 111, 111, 111,
111, 111, 111), `Call Date` = c("18/03/2020", "18/03/2020", "18/03/2020",
"18/03/2020", "18/03/2020", "18/03/2020", "18/03/2020", "18/03/2020",
"18/03/2020", "18/03/2020"), Gender = c("Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female"
), AgeBand = c("0-18 years", "0-18 years", "0-18 years", "0-18 years",
"0-18 years", "0-18 years", "0-18 years", "0-18 years", "0-18 years",
"0-18 years"), CCGCode = c("E38000004", "E38000009", "E38000020",
"E38000023", "E38000029", "E38000010", "E38000030", "E38000035",
"E38000008", "E38000025"), CCGName = c("NHS Barking and Dagenham CCG",
"NHS Bath and North East Somerset CCG", "NHS Brent CCG", "NHS Bromley CCG",
"NHS Canterbury and Coastal CCG", "NHS Bedfordshire CCG", "NHS Castle Point and Rochford CCG",
"NHS City and Hackney CCG", "NHS Bassetlaw CCG", "NHS Calderdale CCG"
), `April20 mapped CCGCode` = c("E38000004", "E38000231", "E38000020",
"E38000244", "E38000237", "E38000010", "E38000030", "E38000035",
"E38000008", "E38000025"), `April20 mapped CCGName` = c("NHS Barking and Dagenham CCG",
"NHS Bath and North East Somerset, Swindon and Wiltshire CCG",
"NHS Brent CCG", "NHS South East London CCG", "NHS Kent and Medway CCG",
"NHS Bedfordshire CCG", "NHS Castle Point and Rochford CCG",
"NHS City and Hackney CCG", "NHS Bassetlaw CCG", "NHS Calderdale CCG"
), TriageCount = c(35, 9, 21, 11, 11, 27, 12, 12, 6, 9)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 1
Views: 290
Reputation: 39605
You can also try:
library(dplyr)
#Code
df <- df %>% mutate(CCGName=trimws(gsub('NHS|CCG','',CCGName)))
Output:
df$CCGName
[1] "Barking and Dagenham" "Bath and North East Somerset"
[3] "Brent" "Bromley"
[5] "Canterbury and Coastal" "Bedfordshire"
[7] "Castle Point and Rochford" "City and Hackney"
[9] "Bassetlaw" "Calderdale"
You can also reach the same output with next code (many thanks and credit to @BenBolker):
#Code 2
df <- df %>% mutate(CCGName=str_remove("^NHS\\s+|\\s+CCG$",string = CCGName))
Upvotes: 3
Reputation: 887251
We can use str_replace
to match the characters after the first word and space, capture as a group and replace with the backreference of the captured group
library(dplyr)
library(stringr)
df2 <- df %>%
mutate(CCGName = str_replace(CCGName, "^\\w+\\s+(.*)\\s+\\w+", '\\1'))
Or using trimws
from base R
trimws(df$CCGName, whitespace = "\\s*(NHS|CCG)\\s*")
NOTE: This uses only tidyverse
solution as the OP mentioned in the post. Also, it is a general solution where it can remove any word that are the first and the last
-output
df2$CCGName
#[1] "Barking and Dagenham" "Bath and North East Somerset" "Brent" "Bromley"
#[5] "Canterbury and Coastal" "Bedfordshire" "Castle Point and Rochford" "City and Hackney"
#[9] "Bassetlaw" "Calderdale"
Upvotes: 4