Reputation: 3994

Remove all characters from text outside of punctuation

I have a dataset that has something of the following:

ID    Type                 Count
1     **Radisson**             8
2     **Renaissance**          9
3     **Hilton** New York Only 8
4     **Radisson** East Cost   8

I want to get a dataset that looks like

ID    Type                 Count
1     **Radisson**             8
2     **Renaissance**          9
3     **Hilton**               8
4     **Radisson**             8

Or even without the * if at all possible.

Any solutions?

Upvotes: 3

Answers (3)

akrun

Reputation: 886948

Here is an option with str_extract

library(stringr)
library(dplyr)
df %>% 
   mutate(Type = str_extract(Type, "[*]*[^*]*[*]*"))
#              Type Count
#1    **Radisson**     8
#2 **Renaissance**     9
#3      **Hilton**     8
#4    **Radisson**     8

Upvotes: 0

MKR

Reputation: 20085

A solution is to use strsplit on ** and pick 2nd element:

df$Type = sapply(strsplit(df$Type, split= "\\*{2}"), function(x)x[2])
df
#   ID        Type Count
# 1  1    Radisson     8
# 2  2 Renaissance     9
# 3  3      Hilton     8
# 4  4    Radisson     8

Upvotes: 0

erocoar

Reputation: 5893

You could just sub out everything that isn't between the stars in the beginning.

df <- data.frame(Type = c("**Radisson**", "**Renaissance**", "**Hilton** New York Only",
                          "**Radisson** East Cost"),
                 Count = c(8, 9, 8, 8))

gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)

[1] "**Radisson**"    "**Renaissance**" "**Hilton**"      "**Radisson**"

So ...

df$Type <- gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)
df

             Type Count
1    **Radisson**     8
2 **Renaissance**     9
3      **Hilton**     8
4    **Radisson**     8

Upvotes: 3

Remove all characters from text outside of punctuation

Answers (3)

Related Questions