JamesNEW
JamesNEW

Reputation: 147

Extract from multiple lines of characters in R

This is my character vector:

my_string <- "\n
1. the user first name: Jamie.xx \n
2. the user name: yumi.xx \n
3. the name is: Myrile.xx \n
...
"

As you can see, the data is rather random and unsystematic. For instance, the colon symbols are not always in the same place each time.

I have tried to use Regex R expression:

y <- gsub("\\:(.)(.*?)\\n","\\1",my_string)

My desired outcome is:

the user first name
the user name
the name is

However, what I have is:

\n1. the user first name 2. the user name 3. the name is

I am not sure where I went wrong; can someone help me? For two things, I want the content not including (: or 1. 2. 3.).

Secondly, I want to remove the \n as well and convert my_string to a list.

Thank you

Upvotes: 3

Views: 42

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520978

Here is one sub approach which is working:

my_string <- "\n
1. the user first name: Jamie.xx \n
2. the user name: yumi.xx \n
3. the name is: Myrile.xx \n"

output <- gsub("(?<=\n)\\d\\.\\s*(.*?):.*?\n", "\\1", my_string, perl=TRUE)
output <- sub("^\\s*|\\s*$", "", output)
output  # if you want a newline-separated string, stop here

lines <- strsplit(output, "\n")[[1]]
lines   # if you want a vector of lines, then use this

[1] "the user first name\nthe user name\nthe name is"
[1] "the user first name" "the user name"       "the name is"

Upvotes: 3

Related Questions