Reputation: 6659
I have strings like this:
a <- "this string has an even number of words"
b <- "this string doesn't have an even number of words"
I want to replace every other space with a new line. So the output would look like this...
myfunc(a)
# "this string\nhas an\neven number\nof words"
myfunc(b)
# "this string\ndoesn't have\nan even\nnumber of\nwords"
I've accomplished this by doing a strsplit
, paste
-ing a newline on even numbered words, then paste(a, collapse=" ")
them back together into one string. Is there a regular expression to use with gsub
that can accomplish this?
Upvotes: 3
Views: 2520
Reputation: 66819
@Jota suggested a simple and concise way:
myfunc = function(x) gsub("( \\S+) ", "\\1\n", x) # Jota's
myfunc2 = function(x) gsub("([^ ]+ [^ ]+) ", "\\1\n", x) # my idea
lapply(list(a,b), myfunc)
[[1]]
[1] "this string\nhas an\neven number\nof words"
[[2]]
[1] "this string\ndoesn't have\nan even\nnumber of\nwords"
How it works. The idea of "([^ ]+ [^ ]+) "
regex is (1) "find two sequences of words/nonspaces with a space between them and a space after them" and (2) "replace the trailing space with a newline".
@Jota's "( \\S+) "
is trickier -- it finds any word with a space before and after it and then replaces the trailing space with a newline. This works because the first word that is caught by this is the second word of the string; and the next word caught by it is not the third (since we have already "consumed"/looked at the space in front of the third word when handling the second word), but rather the fourth; and so on.
Oh, and some basic regex stuff.
[^xyz]
means any single char except the chars x, y, and z.\\s
is a space, while \\S
is anything but a spacex+
means x
one or more times(x)
"captures" x
, allowing for reference in the replacement, like \\1
Upvotes: 8