Paul O
Paul O

Reputation: 13

Regex to truncate string on closest expression/space with length limit

I want to get this results (from -> to)

# use string length limit = 3
1 {2 3}       -> 1 # the string between the {} must be whole
1 2 3         -> 1 2
1 23          -> 1
{1}           -> {1} 
{1 2}         -> empty 
123456        -> 123 # if there is no spaces, cut string by symbols (except {*} expressions). Not necessarily but it would be cool

# one more example. Use string length limit = 5
{1} 2           -> {1} 2
123 45          -> 123
123 4           -> 123 4

Is there a way to do this using PHP with one regex expression?

Length limit may be dynamic.

Similar question - Get first 100 characters from string, respecting full words (but my question requires full contain {*} expressions )

I tried: ^(.{1,3})({.*}|\s|$)

Upvotes: 0

Views: 688

Answers (3)

user3942918
user3942918

Reputation: 26405

The idea here is to define your atomic bits, match each, and use a negative lookbehind to limit the character length (also makes sure to ditch trailing whitespace as well - not sure if this is needed or not, but figured I'd throw it in.)

Only other thing is to use a conditional expression to see whether it's just a single uninterrupted series of chars and split it naively if so (for your 123456 -> 123 example.)

function truncate($string, $length)
{
    $regex = <<<REGEX
        /
        (?(DEFINE)
            (?<chars> [^\s{}]+ )
            (?<group> { (?&atom)* } )
            (?<atom> (?&chars) | (?&group) | \s )
        )
        \A
        (?(?=.*[\s{}])
            (?&atom)*(?<! \s | .{{$length}}. ) |
            .{0,$length}
        )
        /x
REGEX;

    preg_match($regex, $string, $matches);
    return $matches[0];
}

$samples = <<<'DATA'
1 {2 3}
1 2 3
1 23
{1} 
{1 2} 
123456
DATA;

foreach (explode("\n", $samples) as $sample) {
    var_dump(truncate($sample, 3));
}

Output:

string(1) "1"
string(3) "1 2"
string(1) "1"
string(3) "{1}"
string(0) ""
string(3) "123"

And:

$samples = <<<'DATA'
{1} 2
123 45
123 4
DATA;

foreach (explode("\n", $samples) as $sample) {
    var_dump(truncate($sample, 5));
}

Outputs:

string(5) "{1} 2"
string(3) "123"
string(5) "123 4"

Upvotes: 1

MohaMad
MohaMad

Reputation: 2855

try this one:

/^([\w ]{1,3}(?= )|\w{1,3}|\{\w\})/gm

It's working with given samples https://regex101.com/r/iF2tSp/3

1 {2 3}
1 2 3
1 23
{1}
{1 2}
123456

Match 1
Full match  0-1 `1`
Group 1.    n/a `1`
Match 2
Full match  8-11    `1 2`
Group 1.    n/a `1 2`
Match 3
Full match  14-15   `1`
Group 1.    n/a `1`
Match 4
Full match  19-22   `{1}`
Group 1.    n/a `{1}`
Match 5
Full match  29-32   `123`
Group 1.    n/a `123`

Upvotes: 0

RomanPerekhrest
RomanPerekhrest

Reputation: 92884

The solution using preg_match_all function with specific regex pattern:

$str = '1 {2 3}  
1 2 3  
1 23 
{1}   
{1 2} 
123456 ';

$re = '/^(\S \S{1}(?=\s)|\S(?= \S{2})|\{\S\}|\w{3}(?=\w))/m';
preg_match_all($re, $str, $matches);

// the new line containing truncated items(you can `implode` it to get a single string)
print_r($matches[0]);

The output:

Array
(
    [0] => 1
    [1] => 1 2
    [2] => 1
    [3] => {1}
    [4] => 123
)

Regex demo (check "Explanation" section at the right side)

Upvotes: 1

Related Questions