Reputation: 147
This is my character vector:
my_string <- "\n
1. the user first name: Jamie.xx \n
2. the user name: yumi.xx \n
3. the name is: Myrile.xx \n
...
"
As you can see, the data is rather random and unsystematic. For instance, the colon symbols are not always in the same place each time.
I have tried to use Regex R expression:
y <- gsub("\\:(.)(.*?)\\n","\\1",my_string)
My desired outcome is:
the user first name
the user name
the name is
However, what I have is:
\n1. the user first name 2. the user name 3. the name is
I am not sure where I went wrong; can someone help me? For two things, I want the content not including (: or 1. 2. 3.).
Secondly, I want to remove the \n as well and convert my_string to a list.
Thank you
Upvotes: 3
Views: 42
Reputation: 520978
Here is one sub
approach which is working:
my_string <- "\n
1. the user first name: Jamie.xx \n
2. the user name: yumi.xx \n
3. the name is: Myrile.xx \n"
output <- gsub("(?<=\n)\\d\\.\\s*(.*?):.*?\n", "\\1", my_string, perl=TRUE)
output <- sub("^\\s*|\\s*$", "", output)
output # if you want a newline-separated string, stop here
lines <- strsplit(output, "\n")[[1]]
lines # if you want a vector of lines, then use this
[1] "the user first name\nthe user name\nthe name is"
[1] "the user first name" "the user name" "the name is"
Upvotes: 3