Reputation: 632
I have datafarme like this
dummy_data <- structure(list(Date = c("24/06/2002", "24/06/2002", "01/07/2002",
"01/07/2002", "08/07/2002",
"08/07/2002","15/07/2002","17/07/2002",
"22/07/2002", "22/07/2002", "29/07/2002"),
Temp_id= c("ABC", "M567", "M567", "M567", "XYZ", "XYZ",
"T300/500,XYZ", "T300/390,XYZ", "0000,M300", "1234,M678", "ABC")), class =
"data.frame",
row.names = c(NA,
-11L))
In some of the rows in column "temp_id" there is an additional text.
How can I remove the part before ',' and leave the rest of the string in the column?
Required output <- dummy_data <- structure(list(Date = c("24/06/2002", "24/06/2002", "01/07/2002", "01/07/2002", "08/07/2002", "08/07/2002","15/07/2002","17/07/2002",
"22/07/2002", "22/07/2002", "29/07/2002"),
Temp_id= c("ABC", "M567", "M567", "M567", "XYZ", "XYZ",
"XYZ", "XYZ", "M300", "M678", "ABC")), class= "data.frame", row.names = c(NA, -11L))
Upvotes: 1
Views: 132
Reputation: 21400
This too works:
lirary(dplyr)
library(stringr)
dummy_data %>%
mutate(Temp_id = str_extract(Temp_id, "[^,]+$"))
Date Temp_id
1 24/06/2002 ABC
2 24/06/2002 M567
3 01/07/2002 M567
4 01/07/2002 M567
5 08/07/2002 XYZ
6 08/07/2002 XYZ
7 15/07/2002 XYZ
8 17/07/2002 XYZ
9 22/07/2002 M300
10 22/07/2002 M678
11 29/07/2002 ABC
Here [^,]+$
matches any sequence of characters that are not (^
) a comma up until the end ($
) of the string, thus effectively removing any part before the comma (including the comma itself) if present.
Alternatively, we can do this in base R
like so:
sub(".*?([^,]+)$", "\\1", dummy_data$Temp_id)
where .*?
is a 'lazy' match of anything that is prior to any sequence of characters that are not (^
) a comma up until the end ($
) of the string and where \\1
is a backreference that refers back to that sequence captured by (...)
Upvotes: 1
Reputation: 12699
With dplyr
and stringr
...
library(dplyr)
library(stringr)
dummy_data |>
mutate(Temp_id = case_when(str_detect(Temp_id, ",") ~ str_extract(Temp_id, "(?<=,).*$"),
TRUE ~ Temp_id))
#or using `ifelse()`
dummy_data |>
mutate(Temp_id = ifelse(str_detect(Temp_id, ","),
str_extract(Temp_id, "(?<=,).*$"),
Temp_id))
#> Date Temp_id
#> 1 24/06/2002 ABC
#> 2 24/06/2002 M567
#> 3 01/07/2002 M567
#> 4 01/07/2002 M567
#> 5 08/07/2002 XYZ
#> 6 08/07/2002 XYZ
#> 7 15/07/2002 XYZ
#> 8 17/07/2002 XYZ
#> 9 22/07/2002 M300
#> 10 22/07/2002 M678
#> 11 29/07/2002 ABC
Created on 2022-10-13 with reprex v2.0.2
Upvotes: 1
Reputation: 1683
This is your colum Temp_id
:
Temp_id= c("ABC", "M567", "M567", "M567", "XYZ", "XYZ",
"T300/500,XYZ", "T300/390,XYZ", "0000,M300", "1234,M678", "ABC"))
Which:
[1] "ABC" "M567" "M567" "M567" "XYZ" "XYZ" "T300/500,XYZ"
[8] "T300/390,XYZ" "0000,M300" "1234,M678" "ABC"
An easy way is using gsub
function which replaces the regex pattern you indicate with other expression. In this case we are indicating that everying from the beggining of the line to the first comma - ^.*, - is replaced with nothing - '' .
gsub('^.*,','',Temp_id)
[1] "ABC" "M567" "M567" "M567" "XYZ" "XYZ" "XYZ" "XYZ" "M300" "M678" "ABC"
In case you don't understand the regex symbols:
^ -> beginning of line, . -> every character , * -> repeat previous ' . ' until next symbol matches, , -> stop in comma
Applying to the dataframe:
dummy_data$Temp_id = gsub('^.*,','',dummy_data$Temp_id)
> dummy_data
Date Temp_id
1 24/06/2002 ABC
2 24/06/2002 M567
3 01/07/2002 M567
4 01/07/2002 M567
5 08/07/2002 XYZ
6 08/07/2002 XYZ
7 15/07/2002 XYZ
8 17/07/2002 XYZ
9 22/07/2002 M300
10 22/07/2002 M678
11 29/07/2002 ABC
Upvotes: 2