Reputation: 143
From this:
<head>
<link rel="stylesheet" href="styles.css">
</head>
<body>
<img src="img.jpg" alt="" width="500" height="600">
I want to get this:
<head>
<link rel="stylesheet" href="http://bbc.com/styles.css">
</head>
<body>
<img src="http://bbc.com/img.jpg" alt="" width="500" height="600">
When I download a page there are relative links to css, images, etc. How to convert an HTML page while downloading to have all links in it as absolute not relative? I use this answer to download a page (How to get webpage content into a string using Go):
func main() {
s := OnPage("http://bbc.com/")
fmt.Printf(s)
}
func OnPage(link string) string {
res, err := http.Get(link)
if err != nil {
log.Fatal(err)
}
content, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
return string(content)
}
Upvotes: 0
Views: 1573
Reputation: 119
I built a package for downloading content from any URL, including images, CSS, JS, and video.
Check it out: https://github.com/Riaz-Mahmud/Websitebackup
Installation
composer require backdoor/websitebackup
Usage
use Backdoor\WebsiteBackup\WebsiteBackup;
function siteBackup(){
$url = 'link to your website page to backup';
$path = 'path to save backup file';
$websiteBackup = new WebsiteBackup();
$backup = $websiteBackup->backup($url, $path);
}
Upvotes: 1
Reputation: 59
You have to use Regular Expressions to replace the needed portions of the html string. Here is how you can do it (I suppose all links on the page are relative, if not, you should adjust the code):
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"regexp"
)
func main() {
s := OnPage("http://bbc.com/")
fmt.Printf(s)
}
func OnPage(link string) string {
res, err := http.Get(link)
if err != nil {
log.Fatal(err)
}
content, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
html := string(content)
var re = regexp.MustCompile(`(<img[^>]+src)="([^"]+)"`)
updatedHTML := re.ReplaceAllString(html, `$1="`+link+`$2"`)
re = regexp.MustCompile(`(<link[^>]+href)="([^"]+)"`)
updatedHTML = re.ReplaceAllString(html, `$1="`+link+`$2"`)
return updatedHTML
}
Upvotes: 1