chaimp
chaimp

Reputation: 17847

PHP Tokens From a String

Let's say you have a string that looks like this: token1 token2 tok3

And you want to get all of the tokens (specifically the strings between the spaces), AND ALSO their position (offset) and length).

So I would want a result that looks something like this:

array(
    array(
        'value'=>'token1'
        'offset'=>0
        'length'=>6
    ),
    array(
        'value'=>'token2'
        'offset'=>7
        'length'=>6
    ),
    array(
        'value'=>'tok3'
        'offset'=>14
        'length'=>4
    ),
)

I know that this can be done by simply looping through the characters of the string and I can simply write a function to do this.

I am wondering, does PHP have anything built-in that will do this efficiently or at least help with part of this?

I am looking for suggestions and appreciate any help given. Thanks

Upvotes: 3

Views: 2649

Answers (4)

chaimp
chaimp

Reputation: 17847

I like the first answer the most - to use PREG_OFFSET_CAPTURE. In case anyone else is interested, I ended up writing something that does this as well, although I am going to accept the first answer.

Thank you everybody for helping!

function get_words($string) {
    $string_chars = str_split($string);

    $words = array();
    $curr_offset = 0;

    foreach($reduced_string_chars as $offset=>$char) {
        if ($char == ' ') {
            if ($length) $words[] = array('offset'=>$curr_offset,'length'=>$length,'value'=>implode($value_array));

            $curr_offset = $offset;
            $length = 0;
            $value_array = array();
        }
        else {
            $length++;
            $value_array[] = $char;
        }

    }

    return $words;
}

Upvotes: 0

Gumbo
Gumbo

Reputation: 655239

You can use preg_match_all with the PREG_OFFSET_CAPTURE flag:

$str = 'token1 token2 tok3';
preg_match_all('/\S+/', $str, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);

Then you just need to replace the items in $matches[0] like this:

function update($match) {
    return array( 'value' => $value[0], 'offset' => $value[1], 'length' => strlen($value[0]));
}   
array_map('update', $matches[0]);
var_dump($matches[0]);

Upvotes: 4

Surreal Dreams
Surreal Dreams

Reputation: 26380

There's a simpler way, in most respects. You'll have a more basic result, but with much less work put in.

Assuming you have tokena tokenb tokenc stored in $data

$tokens = explode(' ', $data);

Now you have an array of tokens separated by spaces. They will be in order, so $tokens[0] = tokena, $tokens[1] = tokenb, etc. You can very easily get the length of any given item by doing strlen($tokens[$index]); If you need to know how many tokens you were passed, use $token_count = count($tokens);

Not as sophisticated, but next to no work to get it.

Upvotes: 4

Bojangles
Bojangles

Reputation: 101493

You could use explode(), which will give you an array of tokens from the string, and strlen() to count the number of characters in the string. As far as I know, I don't think there is a PHP function to tell you where an element is in an array.

To get around the last problem, you could use a counter variable that loops through the explod()ed array (foreach() for for()) and gives each sub-array in the new data it's position.

Someone please correct my if I'm wrong.

James

Upvotes: 1

Related Questions