GaB
GaB

Reputation: 1132

how to delete the first word and the last in a column?

I am trying to delete the first word and the last word in column CCGName, only with tidyverse, in R. The CCG column contains the word "NHS" with the city name followed by "CCG". I want to get rid of the words "NHS" and "CCG". Is there a way to do this only with tidyverse?

This is my sample of data:

structure(list(SiteType = c(111, 111, 111, 111, 111, 111, 111, 
111, 111, 111), `Call Date` = c("18/03/2020", "18/03/2020", "18/03/2020", 
"18/03/2020", "18/03/2020", "18/03/2020", "18/03/2020", "18/03/2020", 
"18/03/2020", "18/03/2020"), Gender = c("Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female"
), AgeBand = c("0-18 years", "0-18 years", "0-18 years", "0-18 years", 
"0-18 years", "0-18 years", "0-18 years", "0-18 years", "0-18 years", 
"0-18 years"), CCGCode = c("E38000004", "E38000009", "E38000020", 
"E38000023", "E38000029", "E38000010", "E38000030", "E38000035", 
"E38000008", "E38000025"), CCGName = c("NHS Barking and Dagenham CCG", 
"NHS Bath and North East Somerset CCG", "NHS Brent CCG", "NHS Bromley CCG", 
"NHS Canterbury and Coastal CCG", "NHS Bedfordshire CCG", "NHS Castle Point and Rochford CCG", 
"NHS City and Hackney CCG", "NHS Bassetlaw CCG", "NHS Calderdale CCG"
), `April20 mapped CCGCode` = c("E38000004", "E38000231", "E38000020", 
"E38000244", "E38000237", "E38000010", "E38000030", "E38000035", 
"E38000008", "E38000025"), `April20 mapped CCGName` = c("NHS Barking and Dagenham CCG", 
"NHS Bath and North East Somerset, Swindon and Wiltshire CCG", 
"NHS Brent CCG", "NHS South East London CCG", "NHS Kent and Medway CCG", 
"NHS Bedfordshire CCG", "NHS Castle Point and Rochford CCG", 
"NHS City and Hackney CCG", "NHS Bassetlaw CCG", "NHS Calderdale CCG"
), TriageCount = c(35, 9, 21, 11, 11, 27, 12, 12, 6, 9)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Views: 290

Answers (2)

Duck
Duck

Reputation: 39605

You can also try:

library(dplyr)
#Code
df <- df %>% mutate(CCGName=trimws(gsub('NHS|CCG','',CCGName)))

Output:

df$CCGName
 [1] "Barking and Dagenham"         "Bath and North East Somerset"
 [3] "Brent"                        "Bromley"                     
 [5] "Canterbury and Coastal"       "Bedfordshire"                
 [7] "Castle Point and Rochford"    "City and Hackney"            
 [9] "Bassetlaw"                    "Calderdale"  

You can also reach the same output with next code (many thanks and credit to @BenBolker):

#Code 2
df <- df %>% mutate(CCGName=str_remove("^NHS\\s+|\\s+CCG$",string = CCGName))

Upvotes: 3

akrun
akrun

Reputation: 887251

We can use str_replace to match the characters after the first word and space, capture as a group and replace with the backreference of the captured group

library(dplyr)
library(stringr)
df2 <- df %>% 
      mutate(CCGName = str_replace(CCGName, "^\\w+\\s+(.*)\\s+\\w+", '\\1'))

Or using trimws from base R

trimws(df$CCGName, whitespace = "\\s*(NHS|CCG)\\s*")

NOTE: This uses only tidyverse solution as the OP mentioned in the post. Also, it is a general solution where it can remove any word that are the first and the last

-output

df2$CCGName
#[1] "Barking and Dagenham"         "Bath and North East Somerset" "Brent"                        "Bromley"                     
#[5] "Canterbury and Coastal"       "Bedfordshire"                 "Castle Point and Rochford"    "City and Hackney"            
#[9] "Bassetlaw"                    "Calderdale"

Upvotes: 4

Related Questions