Reputation: 91
I am new to R. I have to split a sentence based on phrase delimiters. we can use strsplit for splitting the string based on one delimiter. I want to split the string based on number of delimiters such as [, . : ; ]. How can I do it in one step. Is there any regular expression applicable for this?
For example:
my_string = "This is a sentence. This is a question, right? Yes! It is."
expected output :
"This is a sentence", "This is a question", "right", "yes", "It is"
Upvotes: 3
Views: 232
Reputation: 17621
You can use this:
strsplit("This is a sentence. This is a question, right? Yes! It is.", "\\.|,|\\?|!")
#[[1]]
#[1] "This is a sentence" " This is a question" " right"
#[4] " Yes" " It is"
To get rid of those extra spaces, you could do this:
strsplit("This is a sentence. This is a question, right? Yes! It is.",
"\\. *|, |\\? *|! *")
#[[1]]
#[1] "This is a sentence" "This is a question" "right"
#[4] "Yes" "It is"
As thelatemail pointed out, this is even simpler:
strsplit("This is a sentence. This is a question, right? Yes! It is.",
"[,.:;?!]\\s*") # \\s* represents a space character appearing 0 or more times
You need to escape certain characters that are otherwise interpreted as metacharacters. That's why you see the \\
in front of the .
and the ?
. The |
is a sort of "or".
Upvotes: 4
Reputation: 1074
you could use this pattern to get your output
string input = @"This is a sentence. This is a question, right? Yes! It is.";
string pattern = @"[, . : ; ]";
foreach (string result in Regex.Split(input, pattern))
{
Console.WriteLine("'{0}'", result);
}
please see the console whether you are getting the correct result.
Upvotes: 1