rockstardev
rockstardev

Reputation: 13527

How big is a string in PHP?

If I download a file from a website using:

$html = file_get_html($url); 

Then how can I know the size, in kilobyes, of the HTML string? I want to know, because I want to skip files over 100Kb.

Upvotes: 2

Views: 338

Answers (5)

ghoti
ghoti

Reputation: 46816

To skip fetching large files, you want to use the cURL library.

<?php

function get_content_length($url) {
  $ch = curl_init($url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_HEADER, 1);
  curl_setopt($ch, CURLOPT_NOBODY, 1);
  $hraw=explode("\r\n",curl_exec($ch));
  curl_close($ch);

  $hdrs=array();
  foreach($hraw as $hdr) {
    $a=explode(": ", trim($hdr));
    $hdrs[$a[0]]=$a[1];
  }

  return (isset($hdrs['Content-Length'])) ? $hdrs['Content-Length'] : FALSE;
}

$url="http://www.example.com/";

if (get_content_length($url) < 100000) {
  $html = file_get_contents($url);
  print "Yes.\n";
} else {
  print "No.\n";
}

?>

There may be a more elegant way to pull this information out of curl, but this is what came to mind fastest. YMMV.

Note that setting the CURLOPT options this way makes curl use a "HEAD" rather than "GET" request, so we're not actually fetching this URL twice.

Upvotes: 3

Steven
Steven

Reputation: 13975

You can use mb_strlen to force 8bit or what not and then 1 character = 1 byte

Upvotes: 0

deceze
deceze

Reputation: 521995

file_get_html returns an object to you, the information of how big the string is is lost at that point. Get the string first, the object later:

$html = file_get_contents($url);
echo strlen($html); // size in bytes
$html = str_get_html($html);

Upvotes: 0

Eugen Rieck
Eugen Rieck

Reputation: 65244

The definition, what a string is, is different between PHP and the intuitive meaning:

"Hällo" (mind the Umlaut) looks like a 5-character String, but to PHP it is really a 6-byte array (assuming UTF8) - PHP doesn't have a notion of a String representing text, it just sees it as a sequence of bytes (The PHP euphemism is "binary safe").

So strlen("Hällo") will be 6 (UTF8).

That said, if you want to skip above 100Kb you probably won't mind if it is 99.5k characters translating to 100k bytes.

Upvotes: 2

Borealid
Borealid

Reputation: 98459

If you do file_get_contents, you've already gotten the whole file.

If you mean "skip processing", rather than "skip retrieval", you can just get the length of the string: strlen($html). For kilobytes, divide that by 1024.

This is imprecise because the string may contain UTF-8 characters over one byte in length, and very small files will actually occupy a FS block instead of their byte length, but it's probably good enough for the arbitrary-threshold cutoff you're looking for.

Upvotes: 4

Related Questions