Patrick Beardmore
Patrick Beardmore

Reputation: 1032

Sanitize and standardize a string that contains an indeterminate sequence of delimiting characters and whitespaces

I have a php variable that comes from a form that needs tidying up.

The variable contains a list of items (possibly two or three word items with a space in between words).

I want to convert it to a comma separated list with no superfluous white space. I want the divisions to fall only at commas, semi-colons or new-lines. Blank cannot be an item.

Here's a comprehensive example (with a deliberately messy input):

Input string:

$input = 'dog, cat         ,car,tea pot,,  ,,, ;;
fly,     cake';

Desired result string:

dog,cat,car,tea pot,fly,cake

Upvotes: 0

Views: 325

Answers (5)

mickmackusa
mickmackusa

Reputation: 47761

I do not recommend producing an interim array. Find and replace one or more consecutive delimiting characters -- each of which might be surrounded by zero or more whitespaces.Demo

$str_in = "dog, cat         ,car,tea pot,,  ,,, ;;
fly,     cake";

var_export(
    preg_replace(
        '#\s*(?:[:;,]+\s*)+#',
        ',',
        $str_in
    )
);

Output:

'dog,cat,car,tea pot,fly,cake'

Upvotes: 0

Pascal MARTIN
Pascal MARTIN

Reputation: 400892

You can start by splitting the string into "useful" parts, with preg_split, and, then, implode those parts back together :

$str_in = "dog, cat         ,car,tea pot,,  ,,, ;;
fly,     cake";

$parts = preg_split('/[,;\s]/', $str_in, -1, PREG_SPLIT_NO_EMPTY);

$str_out = implode(',', $parts);

var_dump($parts, $str_out);

(Here, the regex will split on ',', ';', and '\s', which means any whitespace character -- and we only keep non-empty parts)

Will get you, for $parts :

array
  0 => string 'dog' (length=3)
  1 => string 'cat' (length=3)
  2 => string 'car' (length=3)
  3 => string 'tea' (length=3)
  4 => string 'pot' (length=3)
  5 => string 'fly' (length=3)
  6 => string 'cake' (length=4)

And, for $str_out :

string 'dog,cat,car,tea,pot,fly,cake' (length=28)



Edit after the comment : sorry, I didn't notice that one ^^

In that case, you can't split by white-space :-( I would probably split by ',' or ';', iterate over the parts, using trim to remove white-characters at the beginning and end of each item, and only keep those that are not empty :

$useful_parts = array();
$parts = preg_split('/[,;]/', $str_in, -1, PREG_SPLIT_NO_EMPTY);
foreach ($parts as $part) {
    $part = trim($part);
    if (!empty($part)) {
        $useful_parts[] = $part;
    }
}
var_dump($useful_parts);


Executing this portion of code gets me :

array
  0 => string 'dog' (length=3)
  1 => string 'cat' (length=3)
  2 => string 'car' (length=3)
  3 => string 'tea pot' (length=7)
  4 => string 'fly' (length=3)
  5 => string 'cake' (length=4)


And imploding all together, I get, this time :

string 'dog,cat,car,tea pot,fly,cake' (length=28)

Which is better ;-)

Upvotes: 9

Aaron
Aaron

Reputation: 4614

Explode entire string on the comma, walk through that array, first matching all characters that are not a-zA-Z0-9 (and space), then trimming remaining leading/trailing spaces. If empty, unset the item from the array. Implode back to a string.

Ideally, this allows for more messy characters than just ,;\s\n etc.

$strIn = "dog, cat         ,car,tea pot,,  ,,, ;;(++NEW LINE++)fly,     cake";
$firstArray = explode(",", $strIn);

$searchPattern = "/[^A-Za-z0-9 ]+/";

function removeViolators($item, $key) {
    preg_replace($searchPattern, "", $item);
    trim($item);
    if (empty($item)) {
        unset($item);
    }
}

array_walk($firstArray, removeViolators);
$strOut = implode(",", $firstArray);

Upvotes: 1

Matteo Riva
Matteo Riva

Reputation: 25060

Split then grep, seems to give the expected output:

$array = preg_split('/\s*[;,\n]\s*/', $string);
$array = preg_grep('/^\s*$/', $array, PREG_GREP_INVERT);
$string = implode(',', $array);

EDIT: actually grep isn't necessary:

$array = preg_split('/\s*[;,\n]\s*/', $string, -1, PREG_SPLIT_NO_EMPTY);
$string = implode(',', $array);

Upvotes: 1

mjdth
mjdth

Reputation: 6536

You could use explode and trim and str_replace to get the array, manually remove specific characters, and then turn it back into an array.

function getCleanerStringFromString($stringIn) {
    ///turn the string into an array with a comma as the delimiter
    $myarray = explode(",",$stringin);

    for ($ii =0; $ii < count($myarray); $ii++) {
        ///remove new lines, semi colons, etc
        ///use this line as many times as you'd like to take out characters
        $myarray($ii) = str_replace(";","",$myarray($ii);


        ////remove white spaces
        $myarray($ii) = trim($myarray($ii));

    }

    //then turn it back into an array:
    $backstring = implode(","$myarray);

    return $backstring;
}

Upvotes: 1

Related Questions