chamaoskurumi
chamaoskurumi

Reputation: 2503

Replace substring every >n characters (conditionally insert linebreaks for spaces)

I would like to replace spaces with linebreaks (\n) in a pretty long chracter vector in R. However, I don't want to replace every space, but only if the substring exeeds a certain number of characters (n).

Example:

mystring <- "this string is annoyingly long and therefore I would like to insert linebreaks" 

Now I want to insert linebreaks in mystring at every space on the condition that each substring has a length greater than 20 characters (nchar > 20).

Hence, the resulting string is supposed to look like this:

"this string is annoyingly\nlong and therefore I would\nlike to insert linebreaks") 

Linebreaks (\n) were inserted after 25, 26 and 25 characters.

How can I achieve this? Maybe something combining gsub and strsplit?

Upvotes: 6

Views: 1986

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627020

You may use .{21,}?\s regex to match any 21 (since nchar > 20) chars or more, but as few as possible, up to the nearest whitespace:

> gsub("(.{21,}?)\\s", "\\1\n", mystring)
[1] "this string is annoyingly\nlong and therefore I would\nlike to insert linebreaks"

Details:

  • (.{21,}?) - Group 1 capturing any 21 chars or more, but as few as possible (as {21,}? is a lazy quantifier)
  • \\s - a whitespace

The replacement contains the backreference to Group 1 to reinsert the text before the whitespace, and the newline char (feel free to add CR, too, if needed).

Upvotes: 17

Related Questions