AAA
AAA

Reputation: 3168

Get first N words of a string

How do I only get the first 10 words from a string?

Upvotes: 51

Views: 97063

Answers (13)

mickmackusa
mickmackusa

Reputation: 48073

Instead of generating an array of N words, then truncating the array, then re-imploding the words, just truncate the input string after the Nth word. Demo

echo preg_replace('/(?:\s*\S+){10}\K.*/', '', $string);

The pattern will search N sequences of zero or more whitespace character followed by one or more non-whitespace characters, then \K restarts the fullstring match (effectively "releasing" the matches characters, then .* will match the rest of the string. Whatever is matched will be replaced with an empty string.

This solution will ensure that the output string does not have more than N words. It is possible that the string has fewer words than N, so be aware that no mutation will take place and that if that string has a trailing whitespace -- that whitespace will not be removed.

To ensure that leading and whitespaces are removed, adjust the pattern to capture zero to N words which are delimited by whitespaces. Demo

$string = '    I would like to know   ';

var_export(
    preg_replace('/\s*(\S*(?:\s+\S+){0,9}).*/', '$1', $string)
);

Upvotes: 0

Rowlingso
Rowlingso

Reputation: 21

This can easily be done using str_word_count()

$first10words = implode(' ', array_slice(str_word_count($sentence,1), 0, 10));

Upvotes: 2

Milad Rahimi
Milad Rahimi

Reputation: 3844

To select 10 words of the given text you can implement following function:

function first_words($text, $count=10)
{
    $words = explode(' ', $text);

    $result = '';
    for ($i = 0; $i < $count && isset($words[$i]); $i++) {
        $result .= $words[$i];
    }

    return $result;
}

Upvotes: 2

Amr
Amr

Reputation: 5159

    function get_first_num_of_words($string, $num_of_words)
    {
        $string = preg_replace('/\s+/', ' ', trim($string));
        $words = explode(" ", $string); // an array

        // if number of words you want to get is greater than number of words in the string
        if ($num_of_words > count($words)) {
            // then use number of words in the string
            $num_of_words = count($words);
        }

        $new_string = "";
        for ($i = 0; $i < $num_of_words; $i++) {
            $new_string .= $words[$i] . " ";
        }

        return trim($new_string);
    }

Use it like this:

echo get_first_num_of_words("Lorem ipsum dolor sit amet consectetur adipisicing elit. Aliquid, illo?", 5);

Output: Lorem ipsum dolor sit amet

This function also works very well with unicode characters like Arabic characters.

echo get_first_num_of_words("نموذج لنص عربي الغرض منه توضيح كيف يمكن استخلاص أول عدد معين من الكلمات الموجودة فى نص معين.", 100);

Output: نموذج لنص عربي الغرض منه توضيح كيف يمكن استخلاص أول عدد معين من الكلمات الموجودة فى نص معين.

Upvotes: 1

saleem ahmed
saleem ahmed

Reputation: 337

Try this

$str = 'Lorem ipsum dolor sit amet,consectetur adipiscing elit. Mauris ornare luctus diam sit amet mollis.';
 $arr = explode(" ", str_replace(",", ", ", $str));
 for ($index = 0; $index < 10; $index++) {
 echo $arr[$index]. " ";
}

I know this is not time to answer , but let the new comers choose their own answers.

Upvotes: 1

rowmoin
rowmoin

Reputation: 708

This might help you. Function to return 10 no. of words.

function num_of_word($text,$numb) {
 $wordsArray = explode(" ", $text);
 $parts = array_chunk($wordsArray, $numb);

 $final = implode(" ", $parts[0]);

 if(isset($parts[1]))
     $final = $final." ...";
 return $final;
 return;
 }
echo num_of_word($text, 10);

Upvotes: 0

Kelly
Kelly

Reputation: 41591

implode(' ', array_slice(explode(' ', $sentence), 0, 10));

To add support for other word breaks like commas and dashes, preg_match gives a quick way and doesn't require splitting the string:

function get_words($sentence, $count = 10) {
  preg_match("/(?:\w+(?:\W+|$)){0,$count}/", $sentence, $matches);
  return $matches[0];
}

As Pebbl mentions, PHP doesn't handle UTF-8 or Unicode all that well, so if that is a concern then you can replace \w for [^\s,\.;\?\!] and \W for [\s,\.;\?\!].

Upvotes: 143

Pebbl
Pebbl

Reputation: 36075

Simply splitting on spaces will function incorrectly if there is an unexpected character in place of a space in the sentence structure, or if the sentence contains multiple conjoined spaces.

The following version will work no matter what kind of "space" you use between words and can be easily extended to handle other characters... it currently supports any white space character plus , . ; ? !

function get_snippet( $str, $wordCount = 10 ) {
  return implode( 
    '', 
    array_slice( 
      preg_split(
        '/([\s,\.;\?\!]+)/', 
        $str, 
        $wordCount*2+1, 
        PREG_SPLIT_DELIM_CAPTURE
      ),
      0,
      $wordCount*2-1
    )
  );
}

Regular expressions are perfect for this issue, because you can easily make the code as flexible or strict as you like. You do have to be careful however. I specifically approached the above targeting the gaps between words — rather than the words themselves — because it is rather difficult to state unequivocally what will define a word.

Take the \w word boundary, or its inverse \W. I rarely rely on these, mainly because — depending on the software you are using (like certain versions of PHP) — they don't always include UTF-8 or Unicode characters.

In regular expressions it is better to be specific, at all times. So that your expressions can handle things like the following, no matter where they are rendered:

echo get_snippet('Это не те дроиды, которые вы ищете', 5);

/// outputs: Это не те дроиды, которые

Avoiding splitting could be worthwhile however, in terms of performance. So you could use Kelly's updated approach but switch \w for [^\s,\.;\?\!]+ and \W for [\s,\.;\?\!]+. Although, personally I like the simplicity of the splitting expression used above, it is easier to read and therefore modify. The stack of PHP functions however, is a bit ugly :)

Upvotes: 54

Vaci
Vaci

Reputation: 171

I do it this way:

function trim_by_words($string, $word_count = 10) {
    $string = explode(' ', $string);
    if (empty($string) == false) {
        $string = array_chunk($string, $word_count);
        $string = $string[0];
    }
    $string = implode(' ', $string);
    return $string;
}

Its UTF8 compatible...

Upvotes: 0

jawira
jawira

Reputation: 4618

I suggest to use str_word_count:

<?php
$str = "Lorem ipsum       dolor sit    amet, 
        consectetur        adipiscing elit";
print_r(str_word_count($str, 1));
?>

The above example will output:

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor
    [3] => sit
    [4] => amet
    [5] => consectetur
    [6] => adipiscing
    [7] => elit
)

The use a loop to get the words you want.

Source: http://php.net/str_word_count

Upvotes: 3

Rizwan Gill
Rizwan Gill

Reputation: 2253

It is totally what we are searching Just cut n pasted into your program and ran.

function shorten_string($string, $wordsreturned)
/*  Returns the first $wordsreturned out of $string.  If string
contains fewer words than $wordsreturned, the entire string
is returned.
*/
{
$retval = $string;      //  Just in case of a problem

$array = explode(" ", $string);
if (count($array)<=$wordsreturned)
/*  Already short enough, return the whole thing
*/
{
$retval = $string;
}
else
/*  Need to chop of some words
*/
{
array_splice($array, $wordsreturned);
$retval = implode(" ", $array)." ...";
}
return $retval;
}

and just call the function in your block of code just as

$data_itr = shorten_string($Itinerary,25);

Upvotes: 0

Ankur Rastogi
Ankur Rastogi

Reputation: 99

This might help you. Function to return N no. of words

public function getNWordsFromString($text,$numberOfWords = 6)
{
    if($text != null)
    {
        $textArray = explode(" ", $text);
        if(count($textArray) > $numberOfWords)
        {
            return implode(" ",array_slice($textArray, 0, $numberOfWords))."...";
        }
        return $text;
    }
    return "";
    }
}

Upvotes: 1

Spyros
Spyros

Reputation: 48706

http://snipplr.com/view/8480/a-php-function-to-return-the-first-n-words-from-a-string/

function shorten_string($string, $wordsreturned)
{
    $retval = $string;  //  Just in case of a problem
    $array = explode(" ", $string);
    /*  Already short enough, return the whole thing*/
    if (count($array)<=$wordsreturned)
    {
        $retval = $string;
    }
    /*  Need to chop of some words*/
    else
    {
        array_splice($array, $wordsreturned);
        $retval = implode(" ", $array)." ...";
    }
    return $retval;
}

Upvotes: 7

Related Questions