Reputation: 51
I am working on scraping articles from a news website. I have successfully scraped the articles, and the data is successfully reaching the front-end. (It console.log's properly). My issue is that I cannot get the data to render onto the page using a button -the data only populates the page when I refresh.
I know that the issue is related to Handlebars, because if I try to render the page with jQuery, it works.
I believe this has something to do with my routes. I am sending the data to the page via the /articles route, but as you can see, I'm not specifically using a res.render or res.redirect. I think this is why it doesn't work? However, I am not sure how to fix it. I'm a bit shaky on routes and callbacks. I am new to coding, but I assure you I have researched and tried numerous fixes to no avail. Any help or guidance is appreciated. Thank you.
app.get("/", function(req, res) {
db.Article
.find({})
.then(function(dbArticle) {
// res.render("index");
res.render("index", { articles : dbArticle });
});
});
app.get("/scrape", function(req, res) {
axios.get("https://www.nytimes.com/section/technology?
action=click&pgtype=Homepage®ion=TopBar&module=HPMini
Nav&contentCollection=Tech&WT.nav=page")
.then(function(response) {
var $ = cheerio.load(response.data);
$("a.story-link").each(function(i, element) {
var results = {};
results.link = $(this).attr("href");
// console.log("This is my link " + results.link)
results.blurb = $(this).children().find(".summary").text();
// console.log("This is my blurb " + results.blurb)
results.headline = $(this).children().find(".headline").text();
// console.log("This is my headline " + results.headline)
db.Article
.create(results)
.then(function(dbArticle) {
res.json(dbArticle);
// res.end();
// console.log("YES", dbArticle);
})
.catch(function(err) {
res.json(err);
})
})
})
})
app.get("/articles", function(req, res) {
db.Article
.find({})
.then(function(dbArticle) {
res.json(dbArticle);
// console.log(dbArticle, "scraped")
})
.catch(function(err) {
res.json(err);
});
});`
Upvotes: 3
Views: 1875
Reputation: 29092
Somewhere you've got a conceptual disconnect. I'm not sure where exactly so I'm going to try to cover some of the basics in the hope that I might fill in the relevant gap somewhere along the way.
Let's assume you have your Express server running at localhost:3000
. It doesn't matter that it's on localhost
, everything would work much the same if the code were running on a computer halfway around the world. The key thing is that the browser cannot see your code, it has some innate knowledge of HTML, CSS, JavaScript, etc. but knows nothing about your application.
When you type the URL http://localhost:3000/
into the browser address bar it issues an HTTP GET
request for the path /
on the server running at localhost:3000
. Requests are just a sequence of bytes sent 'over the wire' (when I say 'wire' think network cable, even if there isn't actually a physical network cable involved). Using your browser's dev tools take a look in the Network section and track down this particular request. Click on it and take a look at the request details.
The browser is completely ignorant of what is going on in the server. Remember, this server is 'halfway around the world' as far the browser is concerned.
When this request hits your server it'll wend it's way to the route you've registered at app.get('/', ...
. The call to res.render
will run the Handlebars template and generate a string of HTML markup. This string is then converted into bytes and those bytes are sent back 'over the wire' as the response body. The response also has some headers to describe what's in the body but ultimately it's all just a lot bytes sent in a sequence.
This response is what comes back to the browser. It has no idea how it was generated. It could just as easily have been served up from a static file as far as the browser is concerned. It doesn't care. It takes this HTML markup (i.e. a big chunk of text) and parses it to create a DOM tree. This is a proper data structure and it's important to appreciate that it's distinct (albeit closely related) to the corresponding markup. A brief note on terminology, what we call a tag
in markup becomes an element
node in the DOM. The browser then takes this tree of DOM nodes and uses them to generate an image of the page, which is what actually gets shown to the user in the browser viewport.
Why do I keep talking about sending 'bytes over the wire'? Isn't everything just bytes? Well, yes and no. If you've got a data structure like an array or an object you can't just send that 'over the wire'. It might be represented using bytes in your computer's memory but to send it as part of an HTTP request/response it needs to be translated into a format that represents it as a linear sequence of bytes, one after the other. This process is typically referred to as serialization. A string of HTML markup is a serialized representation of HTML DOM nodes, just as JSON is often used as a way to serialize JavaScript data structures such as objects and arrays.
So, back to our /
request, if you take a look in the Network section of your browser's dev tools you'll be able to see the request and response. It presents it in a nicely formatted way so you don't have to try to read the raw byte-sequence yourself. To reiterate the key point, the browser only knows what the HTTP responses tell it, it doesn't know that you used Handlebars to create that HTML markup.
Of course the markup may contain URLs for other resources such as images, CSS, JavaScript and so forth. When the browser is parsing the markup into DOM nodes it'll come across these other URLs and make separate HTTP requests for each one as and when it needs them. Each of these requests behaves in much the same way as the original /
request and once again the browser doesn't know how the server generates the response: it could be returning the contents of a static file or it could be generating the whole response on-the-fly.
So now we come to the topic of updating the page in response to user interactions.
Perhaps the easiest interaction to understand is clicking a link, i.e. an anchor tag like <a href="/other-url">click</a>
. This just updates the URL in the browser address bar and throws away the previous page.
An HTML form is similar but it can perform various other tricks, such as POST
requests. Forms are built using <form>
elements (perhaps more accurately I should say form
elements, without the <
and >
, but I think <form>
is easier to understand so long as we're clear on the difference between tags in markup and DOM elements). For the purposes of this description they're quite similar to anchor links. The browser URL gets updated and a new page gets loaded in it's place. (I'm not going to cover it here but there are ways to submit a form so that the current page doesn't change).
Both of these approaches throw away the page and reload it from scratch from the server. For many years this is how dynamic websites were built. Then AJAX came along.
Using AJAX you can perform an HTTP request from within JavaScript without changing the current page location. The browser doesn't really attempt to understand the response coming back from an AJAX request, it just does a bit of basic parsing and then hands it off to whatever callbacks you've registered.
This brings us back to your original example. You have an AJAX request calling out to the server and pulling back some JSON data. You then use jQuery to use that JSON to update the DOM nodes of your page accordingly. This is a perfectly good approach but let's touch on a few of the alternatives.
The objective is to update the page to reflect the new data, just the same as if we'd refreshed the page in its entirety. To do that we need to build up exactly the same structure of DOM nodes using jQuery that would have been in the markup generated by Handlebars. This is an obvious violation of the DRY principle as we've got that same structure repeated in two places.
One way to remove this duplication is to always use AJAX to load that section of the page. The initial rendering of the page using Handlebars (on the server) would leave that portion of the page blank and then some client-side JavaScript would be used to kick off an AJAX request as soon as the page loads. This can slightly delay the initial load time of the page and can also cause problems for SEO but for a lot of SPAs these aren't significant concerns so this is the approach that's used.
At the opposite end of the spectrum we could just reload the page. We might do this by performing the AJAX request and then separately reloading the page once that request has succeeded (e.g. using window.location.reload()
). We could also cut out the AJAX request altogether. Instead we could submit a form to the relevant update URL. The server would perform the relevant change to the data and then return the newly rendered page, most likely using a call to res.render
just like it does for the /
route.
Then we've got a couple of solutions that sit somewhere between the two.
We could still use an AJAX request and change the response so that rather than returning JSON it returns the HTML markup to inject into the relevant section of the page. We'd need to split up the Handlebars template on the server using suitable partials so that we could just generate the small section of HTML that we need and not the HTML for the whole page.
The other middle-ground solution is to have Handlebars at both ends. The data would still be loaded using JSON but it would be run through a Handlebars template in the browser. To get this working you'd need to load the relevant Handlebars JS file into the browser. You'd also need to find a way to get the relevant template into the browser, either loading it using an AJAX request or more likely injecting it during the initial rendering of the page on the server.
So that's four different approaches and we're far from exhausting the possibilities. Given this is a learning exercise and not a real-world problem it isn't really possible to make a call on which approach is best, there are pros and cons to each. I think the single most important thing is for you to be clear what code runs on the server and what code runs in the client and how the two interact via HTTP requests.
Upvotes: 1