user3552670
user3552670

Reputation: 325

How to add www. to urls in text file

I've got a text file containing a lot of URLs. Some of the URLs start with www. and http:// and some them start with nothing.

I want to add www. in front of every line in the text file where the URL does not start with www. or http://.

$lines = file("sites.txt");

foreach($lines as $line) {
    if(substr($line, 0, 3) != "www" && substr($line, 0, 7) != "http://" ) {

    }
}

That's the code I have right now. I know it's not much, but I have no clue how to add www. in front of every unmatched line.

Upvotes: 1

Views: 135

Answers (4)

Martijn
Martijn

Reputation: 16103

This will add the www. if not present and it will work if there is http/httpS in the found line.

$url = preg_replace("#http(s)?://(?:www\.)?#","http\\1://www.", $url);

This regex will work on the following:

domain.ext -> http://www.domain.ext
www.domain.ext -> http://www.domain.ext
http://www.domain.ext -> http://www.domain.ext
https://domain.ext -> https://www.domain.ext (note the httpS)
https://www.domain.ext -> https://www.domain.ext (note the httpS)


Regex explained:
http(s)?:// -> The http's S might not be there, save in case it is.
(?:www\.)? -> the www. might not be there. Don't save (?:), we're gonna add it anyways

Then we use the \\1 in the replace value to allow the http**S** to stay working when present.
Also, all the string substr functions will fail on https, because it's 1 character longer.

Upvotes: 3

Baba
Baba

Reputation: 850

use this: with only 3 line!

<?
    $g0 = file_get_contents("site");
    #--------------------------------------------------
    $g1 = preg_replace("#^http://#m","",$g0);
    $g2 = preg_replace("/^www\./m","",$g1);
    $g3 = preg_replace("/^/m","http://",$g2);
    #--------------------------------------------------
    file_put_contents("site2",$g3);
?>

input file

1.com
www.d.som
http://ss.com
http://www.ss.com

output file:

http://1.com
http://d.som
http://ss.com
http://ss.com

Upvotes: 0

alreadycoded.com
alreadycoded.com

Reputation: 326

$lines = file("/var/www/vhosts/mon.totalinternetgroup.nl/public/sites/sites.txt");
$new_lines = array();
foreach($lines as $line) {
    if(substr($line, 0, 3) != "www" || substr($line, 0, 7) != "http://" ) {
       $new_lines[] = "www.".$line;
    }else{
       $new_lines[] = $line;  
    }
}

$content = implode("\n", $new_lines);
file_put_contents("/var/www/vhosts/mon.totalinternetgroup.nl/public/sites/sites.txt", $content);

Upvotes: 0

Daniel W.
Daniel W.

Reputation: 32290

The trick is to pass $lines by reference so you will be able to alter them:

foreach($lines as &$line) { // note the  '&'

    // http:// and www. is missing:
    if(stripos($line, 'http://www.') === false) {
        $line = 'http://www.' . $line;

    // only http:// is missing:
    } elseif(stripos($line, 'http://www.') !== false && stripos($line, 'http://') === false) {
        $line = 'http://' . $line;

    // only www. is missing:
    } elseif(stripos($line, 'http://') !== 0 && stripos($line, 'www.') !== 0)
        $line = 'http://www.' . str_replace('http://', '', $line);

    // nothing is missing:
    } else {
    }
}

Note:

Simply adding www. to a non-www domain can be wrong because www.example.com and example.com CAN have completely different contents, different servers, different destination, different DNS mapping. It's good to add http:// but not to add www..

To write the new array back to the file, you'd use:

file_put_contents(implode(PHP_EOL, $lines), 'sites.txt');

Upvotes: 2

Related Questions