user1444433
user1444433

Reputation: 111

Regular expression to split long strings in several lines

I'm not an expert in regular expressions and today in my project I face the need to split long string in several lines in order to check if the string text fits the page height.

I need a C# regular expression to split long strings in several lines by "\n", "\r\n" and keeping 150 characters by line maximum. If the character 150 is in the middle of an word, the entire word should be move to the next line.

Can any one help me?

Upvotes: 8

Views: 5265

Answers (5)

jessehouwing
jessehouwing

Reputation: 114571

It's actually a quite simple problem. Look for any characters up to 150, followed by a space. Since Regex is greedy by nature it will do exactly what you want it to. Replace it by the Match plus a newline:

.{0,150}(\s+|$)

Replace with

$0\r\n

See also: http://regexhero.net/tester/?id=75645133-1de2-4d8d-a29d-90fff8b2bab5

Upvotes: 7

Jon Senchyna
Jon Senchyna

Reputation: 8037

This code should help you. It will check the length of the current string. If it is greater than your maxLength (150) in this case, it will start at the 150th character and (going backwards) find the first non-word character (as described by the OP, this is a sequence of non-space characters). It will then store the string up to that character and start over again with the remaining string, repeating until we end up with a substring that is less than maxLength characters. Finally, join them all back together again in a final string.

string line = "This is a really long run-on sentence that should go for longer than 150 characters and will need to be split into two lines, but only at a word boundary.";

int maxLength = 150;
string delimiter = "\r\n";

List<string> lines = new List<string>();
// As long as we still have more than 'maxLength' characters, keep splitting
while (line.Length > maxLength)
{
    // Starting at this character and going backwards, if the character
    // is not part of a word or number, insert a newline here.
    for (int charIndex = (maxLength); charIndex > 0; charIndex--)
    {
        if (char.IsWhiteSpace(line[charIndex]))
        {
            // Split the line after this character 
            // and continue on with the remainder
            lines.Add(line.Substring(0, charIndex+1));
            line = line.Substring(charIndex+1);
            break;
        }
    }
}
lines.Add(line);
// Join the list back together with delimiter ("\r\n") between each line
string final = string.Join(delimiter , lines);

// Check the results
Console.WriteLine(final);

Note: If you run this code in a console application, you may want to change "maxLength" to a smaller number so that the console doesn't wrap on you.

Note: This code does not take into effect any tab characters. If tabs are also included, your situation gets a bit more complicated.

Update: I fixed a bug where new lines were starting with a space.

Upvotes: 0

paul
paul

Reputation: 22001

if you just want to split a long string into lines of 150 chars then I'm not sure why you'd need a regular expression:

    private string stringSplitter(string inString)
    {
        int lineLength = 150;

        StringBuilder sb = new StringBuilder();

        while (inString.Length > 0)
        {
            var curLength = inString.Length >= lineLength ? lineLength : inString.Length;

            var lastGap = inString.Substring(0, curLength).LastIndexOfAny(new char[] {' ', '\n'});

            if (lastGap == -1)
            {
                sb.AppendLine(inString.Substring(0, curLength));
                inString = inString.Substring(curLength);
            }
            else
            {
                sb.AppendLine(inString.Substring(0, lastGap));
                inString = inString.Substring(lastGap + 1);
            }
        }

        return sb.ToString();
    }

edited to account for word breaks

Upvotes: 0

Roman Sokk
Roman Sokk

Reputation: 175

var regex = new Regex(@".{0,150}", RegexOptions.Multiline);
var strings = regex.Replace(sourceString, "$0\r\n");

Upvotes: 1

Jirka Hanika
Jirka Hanika

Reputation: 13529

Here you go:

^.{1,150}\n

This will match the longest initial string like this.

Upvotes: 0

Related Questions