Reputation: 325
I've got a text file containing a lot of URLs. Some of the URLs start with www.
and http://
and some them start with nothing.
I want to add www.
in front of every line in the text file where the URL does not start with www.
or http://
.
$lines = file("sites.txt");
foreach($lines as $line) {
if(substr($line, 0, 3) != "www" && substr($line, 0, 7) != "http://" ) {
}
}
That's the code I have right now. I know it's not much, but I have no clue how to add www.
in front of every unmatched line.
Upvotes: 1
Views: 135
Reputation: 16103
This will add the www.
if not present and it will work if there is http/httpS in the found line.
$url = preg_replace("#http(s)?://(?:www\.)?#","http\\1://www.", $url);
This regex will work on the following:
domain.ext -> http://www.domain.ext
www.domain.ext -> http://www.domain.ext
http://www.domain.ext -> http://www.domain.ext
https://domain.ext -> https://www.domain.ext (note the httpS)
https://www.domain.ext -> https://www.domain.ext (note the httpS)
Regex explained:
http(s)?://
-> The http's S might not be there, save in case it is.
(?:www\.)?
-> the www.
might not be there. Don't save (?:
), we're gonna add it anyways
Then we use the \\1
in the replace value to allow the http**S** to stay working when present.
Also, all the string substr
functions will fail on https, because it's 1 character longer.
Upvotes: 3
Reputation: 850
use this: with only 3 line!
<?
$g0 = file_get_contents("site");
#--------------------------------------------------
$g1 = preg_replace("#^http://#m","",$g0);
$g2 = preg_replace("/^www\./m","",$g1);
$g3 = preg_replace("/^/m","http://",$g2);
#--------------------------------------------------
file_put_contents("site2",$g3);
?>
input file
1.com
www.d.som
http://ss.com
http://www.ss.com
output file:
http://1.com
http://d.som
http://ss.com
http://ss.com
Upvotes: 0
Reputation: 326
$lines = file("/var/www/vhosts/mon.totalinternetgroup.nl/public/sites/sites.txt");
$new_lines = array();
foreach($lines as $line) {
if(substr($line, 0, 3) != "www" || substr($line, 0, 7) != "http://" ) {
$new_lines[] = "www.".$line;
}else{
$new_lines[] = $line;
}
}
$content = implode("\n", $new_lines);
file_put_contents("/var/www/vhosts/mon.totalinternetgroup.nl/public/sites/sites.txt", $content);
Upvotes: 0
Reputation: 32290
The trick is to pass $lines
by reference so you will be able to alter them:
foreach($lines as &$line) { // note the '&'
// http:// and www. is missing:
if(stripos($line, 'http://www.') === false) {
$line = 'http://www.' . $line;
// only http:// is missing:
} elseif(stripos($line, 'http://www.') !== false && stripos($line, 'http://') === false) {
$line = 'http://' . $line;
// only www. is missing:
} elseif(stripos($line, 'http://') !== 0 && stripos($line, 'www.') !== 0)
$line = 'http://www.' . str_replace('http://', '', $line);
// nothing is missing:
} else {
}
}
Note:
Simply adding www.
to a non-www domain can be wrong because www.example.com
and example.com
CAN have completely different contents, different servers, different destination, different DNS mapping. It's good to add http://
but not to add www.
.
To write the new array back to the file, you'd use:
file_put_contents(implode(PHP_EOL, $lines), 'sites.txt');
Upvotes: 2