Reputation: 163
Given this asynchronous recursive function (works on node if you copy and paste it with axios installed):
const axios = require('axios');
fetchAllSync('https://api.github.com/repos/octokit/rest.js/issues/comments', {per_page:100})
.then(function(data){
console.log(data.length);
});
async function fetchAllSync(requestUrl, parameters) {
// Construct request url with given parameters
requestUrl = parameters ? `${requestUrl}?`: requestUrl;
for(let parameter in parameters){
requestUrl = requestUrl.concat(`${parameter}=${parameters[parameter]}&`);
}
let res = await axios.get(requestUrl);
// Return results immediataly if there is only 1 page of results.
if(!res.headers.link){
return Promise.resolve(res.data);
}
// Get page relation links from header of response and make request for next page of comments
let linkRelations = res.headers.link.split(',').map(function(item) {
return item.trim();
});
for(let linkRel of linkRelations){
let [link, rel] = linkRel.split(';').map(function(item) {
return item.trim();
});
link = link.substring(1, link.length - 1);
if(rel == 'rel="next"'){
// Make recursive call to same method to get next page of comments
return res.data.concat(await fetchAllSync(link));
}
}
return res.data;
}
The above code works. It gets the gets the request url for the next page of data using the link relations in the headers and retrieves a new set of data using recursion.
However, if I change async function fetchAllSync(requestUrl, parameters)
to async function fetchAllSync(requestUrl, parameters={})
my code breaks. What happens is that the first page of data is properly retrieved but the subsequent recursive calls retrieve the same data over and over again.
When I console.log the requestUrl parameter, it is indeed the url for the second page of data. It is also the same requestUrl for all subsequent calls. Why is this happening? Even the response from the axios call says the second page is requested, so why aren't the link relations in the headers reflecting that when I give a default value to parameter
?
Upvotes: 0
Views: 224
Reputation: 135197
tangled wires
You have a problem of mixing many concerns into one large function. I would recommend you break the tasks into separate parts and compose your "big" solution by combining many "small" solutions.
Write parseLink
to extract relevant links from the response header -
const parseLink = (link = "", rel = "") =>
{ const re = /<([^>]+)>; rel="([^"]+)"/g
let m
while (m = re.exec(link))
if (rel === m[2])
return m[1]
return undefined
}
const header =
`<https://api.github.com/repositories/711976/issues/comments?page=2>; rel="next", <https://api.github.com/repositories/711976/issues/comments?page=169>; rel="last"`
console.log(parseLink(header, "next"))
// https://api.github.com/repositories/711976/issues/comments?page=2
console.log(parseLink(header, "foo"))
// undefined
Use URLSearchParams to modify URL search params -
const href =
`https://api.github.com/repositories/711976/issues/comments?foo=bar`
const u = new URL(href)
u.searchParams.set("per_page", 100)
console.log(u.toString())
// https://api.github.com/repositories/711976/issues/comments?foo=bar&per_page=100
u.searchParams.set("foo", "hello")
console.log(u.toString())
// https://api.github.com/repositories/711976/issues/comments?foo=hello&per_page=100
But don't go tangling URL parameter code with fetching code. Use a separate function, such as urlWithParams
-
const urlWithParams = (href = "", params = {}) =>
{ const u = new URL(href)
for (const [k, v] of Object.entries(params))
u.searchParams.set(k, v)
return u.toString()
}
const href =
`https://api.github.com?foo=bar`
console.log(urlWithParams(href, { per_page: 100 }))
// https://api.github.com/?foo=bar&per_page=100
console.log(urlWithParams(href, { per_page: 100, page: 3 }))
// https://api.github.com/?foo=bar&per_page=100&page=3
console.log(urlWithParams(href, { per_page: 100, foo: "hello" }))
// https://api.github.com/?foo=hello&per_page=100
We can now write a generic fetchPage
which fetches a single page -
const fetchPage = async (href = "") =>
{ const { headers, data } = await axios.get(href)
const next = parseLink(headers.link, "next")
return { data, next }
}
const href =
`https://api.github.com/repositories/711976/issues/comments?foo=bar`
const result =
axios.get(urlWithParams(href, { per_page: 100 }))
result.then(console.log)
// { data: [ { ... }, { ... }, ... ]
// , next: `https://api.github.com/repositories/711976/issues/comments?foo=bar&per_page=100&page=2`
// }
And the generic fetchPages
which fetches all linked pages -
const fetchPages = async (href = "") =>
asyncUnfold
( async (then, done, { data, next }) =>
next
? then(data, await fetchPage(next))
: done(data)
, await fetchPage(href)
)
.then(pages => pages.flat(1)) // merge pages into single array
And finally, fetchComments
is a specialization of fetchPages
and urlWithParams
-
const commentsAPI =
`https://api.github.com/repositories/711976/issues/comments`
const fetchComments = (parameters = {}) =>
fetchPages(urlWithParams(commentsAPI, parameters))
const result =
fetchComments({ per_page: 100 })
result.then(console.log, console.error)
// [ { ... }, { ... }, ... ]
The generic asyncUnfold
is explained and implemented in here and here. Here is an implementation for reference -
const asyncUnfold = async (run, initState) =>
run
( async (value, nextState) => [ value, ...await asyncUnfold(run, nextState) ]
, async () => []
, initState
)
Verify it all works by expanding and running the snippet in your browser below. NB, the repo in your original question (711976) has over 50 pages of 100 comments each. For purposes of this demo, I am using a smaller repo (261279710) which only has 18 pages. Warning: if you run this demo multiple times, you will probably get rate-limited by the server because it's not using an API token -
const axios =
{ async get (href = "")
{ const r = await fetch(href)
return Object.assign
( { headers: { link: r.headers.get("link") } }
, { data: await r.json() }
)
}
}
const asyncUnfold = async (run, initState) =>
run
( async (value, nextState) => [ value, ...await asyncUnfold(run, nextState) ]
, async () => []
, initState
)
const parseLink = (link = "", rel = "") =>
{ const re = /<([^>]+)>; rel="([^"]+)"/g
let m
while (m = re.exec(link))
if (rel === m[2])
return m[1]
return undefined
}
const urlWithParams = (href = "", params = {}) =>
{ const u = new URL(href)
for (const [k, v] of Object.entries(params))
u.searchParams.set(k, v)
return u.toString()
}
const fetchPage = async (href = "") =>
{ const { headers, data } = await axios.get(href)
const next = parseLink(headers.link, "next")
return { data, next }
}
const fetchPages = async (href = "") =>
asyncUnfold
( async (then, done, { data, next }) =>
next
? then(data, await fetchPage(next))
: done(data)
, await fetchPage(href)
)
.then(pages => pages.flat(1))
const commentsAPI =
`https://api.github.com/repositories/261279710/issues/comments`
const fetchComments = (parameters = {}) =>
fetchPages(urlWithParams(commentsAPI, parameters))
const comments =
fetchComments({ per_page: 100 })
comments.then(console.log, console.error)
comments.then(r => console.log("total: %d", r.length))
<p>fetching ~20 pages of comments. please wait...</p>
<p><b>Warning:</b> running this demo multiple times may cause the remote server to rate-limit your client.</p>
// [ { ... }, { ... }, { ... }, ... ]
// total: 1840
make it fast!
Notice the program above fetches the pages in serial order. Did you also notice that the server also returns a rel="last"
to inform you the last page? With that in mind we could write something that fetches pages in parallel.
Let's imagine a page
function that returns the page param from a URL -
const page = (href = "") =>
// ...
console.log(page("/path/to/repo")) // 0
console.log(page("/path/to/repo?page=123")) // 123
And a next
function which takes a URL and gives us the URL for the next page -
const next = (href = "") =>
// ...
console.log(next("/path/to/repo")) // "/path/to/repo?page=1
console.log(page("/path/to/repo?page=123")) // "/path/to/repo?page=124
Using URL
and URLSearchParams
those should be easy for you to write! Then adjust fetchPage
so it returns the last
link in the response -
const fetchPage = async (href = "") =>
{ const { headers, data } = await axios.get(href)
const next = parseLink(headers.link, "next")
const last = parseLink(headers.link, "last")
return { data, next, last }
}
Now let's adjust fetchPages
to fetch in parallel -
const fetchPages = (href = "") =>
{ const first =
await fetchPage(href)
const pages =
unfold
( (then, done, href) =>
page(href) < page(first.last)
? then(href, next(href))
: done(href)
, first.next
)
const allData =
// parallel!
await Promise.all(pages.flatMap(p => fetchPage(p).then(r => r.data))))
return first.data.concat(allData)
}
Now your only limitation is concurrent connections 😅
Upvotes: 0
Reputation: 1412
This is probably due to requestUrl = parameters ? `${requestUrl}?`: requestUrl;
line
const alice = undefined ? 'foo' : 'bar'
const bob = {} ? 'foo' : 'bar
console.log(alice); // 'bar'
console.log(bob); // 'foo'
This is because Boolean({})
returns true, and an implicit coercion happens with ?
operator
Upvotes: 5