Ram23
Ram23

Reputation: 91

Split the string using array of delimiters in r

I am new to R. I have to split a sentence based on phrase delimiters. we can use strsplit for splitting the string based on one delimiter. I want to split the string based on number of delimiters such as [, . : ; ]. How can I do it in one step. Is there any regular expression applicable for this?

For example:

my_string = "This is a sentence.  This is a question, right?  Yes!  It is."

expected output :

"This is a sentence", "This is a question", "right", "yes", "It is"

Upvotes: 3

Views: 232

Answers (2)

Jota
Jota

Reputation: 17621

You can use this:

strsplit("This is a sentence. This is a question, right? Yes! It is.", "\\.|,|\\?|!")
#[[1]]
#[1] "This is a sentence"  " This is a question" " right"             
#[4] " Yes"                " It is"

To get rid of those extra spaces, you could do this:

strsplit("This is a sentence. This is a question, right? Yes! It is.",
         "\\. *|, |\\? *|! *")
#[[1]]
#[1] "This is a sentence" "This is a question" "right"             
#[4] "Yes"                "It is"

As thelatemail pointed out, this is even simpler:

strsplit("This is a sentence. This is a question, right? Yes! It is.",
     "[,.:;?!]\\s*")  # \\s* represents a space character appearing 0 or more times

You need to escape certain characters that are otherwise interpreted as metacharacters. That's why you see the \\ in front of the . and the ?. The | is a sort of "or".

Upvotes: 4

Joseph
Joseph

Reputation: 1074

you could use this pattern to get your output

        string input = @"This is a sentence. This is a question, right? Yes! It is.";
        string pattern = @"[, . : ; ]";

        foreach (string result in Regex.Split(input, pattern))
        {
            Console.WriteLine("'{0}'", result);
        }

please see the console whether you are getting the correct result.

Upvotes: 1

Related Questions