envysea
envysea

Reputation: 1031

Split/parse a PHP string by separate certain words

I've searched the PHP manual, Stackoverflow, and some forums, but I'm stumped on some PHP logic. Maybe I'm just tired, but I'd really appreciate some assistance or direction on this by anybody.

I have a PHP string, say:

 $string = 'cats cat1 cat2 cat3 dogs dog1 dog2 monkey creatures monkey_creature1 monkey_creature2 monkey_creature3';

Eventually, I would ideally like my final output to look something like this, but just getting the array for now is more than fine:

 <h2>cats</h2>
 <ul>
     <li>cat1</li>
     <li>cat2</li>
     <li>cat3</li>
 </ul>

 <h2>dogs</h2>
 <ul>
     <li>dog1</li>
     <li>dog2</li>
 </ul>

 <h2>monkey creatures</h2>
 <ul>
     <li>monkey_creature1</li>
     <li>monkey_creature2</li>
     <li>monkey_creature3</li>
 </ul>

There's a catch though, sometimes the string will be slightly different:

 $string = 'cats cat1 cat2 cat3 cat4 cat5 cats6 dogs dogs1 dogs2 monkey creatures monkey_creature1 lemurs lemur1 lemur2 lemur3';

Anyways, this is my first question on Stackoverflow, and thanks in advance for all the help guys!

Edit: I'm working under some certain limitations, and I can't change any code before the string. I know in advanced all the parents ('cats', 'dogs', 'lemurs', 'monkey creatures (with space)'

Upvotes: 4

Views: 7810

Answers (5)

Dan Applegate
Dan Applegate

Reputation: 346

You'll probably want to use the preg_match_all function and use a regular expression. That way, you don't have to use any loops:

$matches = array();
$string = 'cats cat1 cat2 cat3 dogs dog1 dog2 monkey creatures monkey_creature1 monkey_creature2 monkey_creature3'
preg_match_all('/((?:[a-z]+ )*?[a-z]+s) ((?:[a-z_]+[0-9] ?)+)*/i', $string, $matches);

// $matches now contains multidemensional array with 3 elements, indices
// 1 and 2 contain the animal name and list of those animals, respectively
$animals = array_combine($matches[1], $matches[2]);
$animals = array_map(function($value) {
    return explode(' ', trim($value));
}, $animals);
print_r($animals);

Output:

Array
(
    [cats] => Array
        (
            [0] => cat1
            [1] => cat2
            [2] => cat3
        )

    [dogs] => Array
        (
            [0] => dog1
            [1] => dog2
        )

    [monkey creatures] => Array
        (
            [0] => monkey_creature1
            [1] => monkey_creature2
            [2] => monkey_creature3
        )

)

Upvotes: 2

Varol
Varol

Reputation: 1858

Figured i won't be able to submit the best answer so decided to run for the fewest lines. (Kidding, sorry for extremely dirty code)

$string = 'cats cat1 cat2 cat3 cat4 cat5 cats6 dogs dogs1 dogs2 monkey creatures monkey_creature1 lemurs lemur1 lemur2 lemur3';
$categories = array( 'cats', 'dogs', 'monkey creatures', 'lemurs' );

for( $i=0; $i<count( $categories ); $i++ ) $parts[] = @explode( ' ', strstr( $string, $categories[$i] ) );
for( $i=0; $i<count( $parts ); $i++ ) $groups[] = ($i<count($parts)-1) ? array_diff( $parts[$i], $parts[$i+1] ) : $parts[$i];
for( $i=0; $i<count( $groups ); $i++ ) for( $j=0; $j<count( $groups[$i] ); $j++ ) if( ! is_numeric( substr( $groups[$i][$j], -1 ) ) ) unset($groups[$i][$j]);

print_r( $groups );

You may notice that my method depends on the fact that elements should have a numeric suffix. This is actually nonsense but so as the input we're dealing with.

my output is:

Array
(
    [0] => Array
        (
            [1] => cat1
            [2] => cat2
            [3] => cat3
            [4] => cat4
            [5] => cat5
            [6] => cats6
        )

    [1] => Array
        (
            [1] => dogs1
            [2] => dogs2
        )

    [2] => Array
        (
            [2] => monkey_creature1
        )

    [3] => Array
        (
            [1] => lemur1
            [2] => lemur2
            [3] => lemur3
        )

)

Upvotes: 1

CodeMonkey
CodeMonkey

Reputation: 114

Here is my $0.50

<?php
$parents = array('cats', 'dogs', 'lemurs', 'monkey creatures');

// Convert all spaces to underscores in parents
$cleaned_parents = array();
foreach ($parents as $parent)
{
        $cleaned_parents[] = str_replace(' ', '_', $parent);
}

$input = 'cats cat1 cat2 cat3 dogs dog1 dog2 monkey creatures monkey_creature1 monkey_creature2 monkey_creature3';

// Change all parents to the "cleaned" versions with underscores
$input = str_replace($parents, $cleaned_parents, $input);

// Make an array of all tokens in the input string
$tokens = explode(' ', $input);
$result = array();

// Loop through all the tokens
$currentParent = null; // Keep track of current parent
foreach ($tokens as $token)
{
    // Is this a parent?
    if (in_array($token, $cleaned_parents))
    {
        // Create the parent in the $result array
        $currentParent = $token;
        $result[$currentParent] = array();
    }
    elseif ($currentParent != null)
    {
        // Add as child to the current parent
        $result[$currentParent][] = $token;
    }
}

print_r($result);

Output:

Array
(
    [cats] => Array
        (
            [0] => cat1
            [1] => cat2
            [2] => cat3
        )

    [dogs] => Array
        (
            [0] => dog1
            [1] => dog2
        )

    [monkey_creatures] => Array
        (
            [0] => monkey_creature1
            [1] => monkey_creature2
            [2] => monkey_creature3
        )

)

Upvotes: 1

galymzhan
galymzhan

Reputation: 5558

Your second example as a string:

<?php

$parents = array('cats', 'dogs', 'monkey creatures', 'lemurs');
$result = array();

$dataString = 'cats cat1 cat2 cat3 cat4 cat5 cats6 dogs dogs1 dogs2 monkey creatures monkey_creature1 lemurs lemur1 lemur2 lemur3';
foreach ($parents as $parent) {
  // Consider group only if it is present in the data string
  if (strpos($dataString, $parent) !== false) {
    $result[$parent] = array();
  }
}
$parts = explode(' ', $dataString);
foreach (array_keys($result) as $group) {
  $normalizedGroup = str_replace(' ', '_', $group);
  foreach ($parts as $part) {
    if (preg_match("/^$normalizedGroup?\d+$/", $part)) {
      $result[$group][] = $part;
    }
  }
}
print_r($result);

Output:

Array
(
    [cats] => Array
        (
            [0] => cat1
            [1] => cat2
            [2] => cat3
            [3] => cat4
            [4] => cat5
            [5] => cats6
        )

    [dogs] => Array
        (
            [0] => dogs1
            [1] => dogs2
        )

    [monkey creatures] => Array
        (
            [0] => monkey_creature1
        )

    [lemurs] => Array
        (
            [0] => lemur1
            [1] => lemur2
            [2] => lemur3
        )

)

Upvotes: 1

Bryan
Bryan

Reputation: 6752

I've designed an answer that will work regardless of whether or not there are spaces between "keywords", so long as the first keyword is not plural :)

Below is the code, feel free to check it out, it's really beautiful what you can do with text :)

<?
$string = 'cats cat1 cat2 cat3 dogs dog1 dog2 monkey creatures monkey_creature1 monkey_creature2 monkey_creature3';

$current_prefix = '';
$potential_prefix_elements = array();

$word_mapping = array();

foreach(split(" ", $string) as $substring) {
    if(strlen($current_prefix)) {
        // Check to see if the current substring, starts with the prefix
        if(strrpos($substring, $current_prefix) === 0)
            $word_mapping[$current_prefix . 's'][] = $substring;
        else
            $current_prefix = '';
    }

    if(!strlen($current_prefix)) {
        if(preg_match("/(?P<new_prefix>.+)s$/", $substring, $matches)) {
            $potential_prefix_elements[] = $matches['new_prefix'];

            // Add an 's' to make the keys plural
            $current_prefix = join("_", $potential_prefix_elements);

            // Initialize an array for the current word mapping
            $word_mapping[$current_prefix . 's'] = array();

            // Clear the potential prefix elements
            $potential_prefix_elements = array();
        } else {
            $potential_prefix_elements[] = $substring;
        }
    }
}

print_r($word_mapping);

Here's the output, I've given it to you as an array, so you can easily construct a ul / li hierarchy :)

Array
(
    [cats] => Array
        (
            [0] => cat1
            [1] => cat2
            [2] => cat3
        )

    [dogs] => Array
        (
            [0] => dog1
            [1] => dog2
        )

    [monkey_creatures] => Array
        (
            [0] => monkey_creature1
            [1] => monkey_creature2
            [2] => monkey_creature3
        )

)

Upvotes: 4

Related Questions