frownyface
frownyface

Reputation: 163

Why does this default parameter break my recursive function?

Given this asynchronous recursive function (works on node if you copy and paste it with axios installed):

const axios = require('axios');

fetchAllSync('https://api.github.com/repos/octokit/rest.js/issues/comments', {per_page:100})
    .then(function(data){
        console.log(data.length);
    }); 

async function fetchAllSync(requestUrl, parameters) {
    // Construct request url with given parameters
    requestUrl = parameters ? `${requestUrl}?`: requestUrl;
    for(let parameter in parameters){
        requestUrl = requestUrl.concat(`${parameter}=${parameters[parameter]}&`);
    }
    let res = await axios.get(requestUrl);
    // Return results immediataly if there is only 1 page of results.
    if(!res.headers.link){
        return Promise.resolve(res.data);
    }
    // Get page relation links from header of response and make request for next page of comments
    let linkRelations = res.headers.link.split(',').map(function(item) {
        return item.trim();
    });
    for(let linkRel of linkRelations){
        let [link, rel] = linkRel.split(';').map(function(item) {
            return item.trim();
        });
        link = link.substring(1, link.length - 1);
        if(rel == 'rel="next"'){
            // Make recursive call to same method to get next page of comments
            return res.data.concat(await fetchAllSync(link));
        }
    }
    return res.data;
}

The above code works. It gets the gets the request url for the next page of data using the link relations in the headers and retrieves a new set of data using recursion.

However, if I change async function fetchAllSync(requestUrl, parameters) to async function fetchAllSync(requestUrl, parameters={}) my code breaks. What happens is that the first page of data is properly retrieved but the subsequent recursive calls retrieve the same data over and over again.

When I console.log the requestUrl parameter, it is indeed the url for the second page of data. It is also the same requestUrl for all subsequent calls. Why is this happening? Even the response from the axios call says the second page is requested, so why aren't the link relations in the headers reflecting that when I give a default value to parameter?

Upvotes: 0

Views: 224

Answers (2)

Mulan
Mulan

Reputation: 135197

tangled wires

You have a problem of mixing many concerns into one large function. I would recommend you break the tasks into separate parts and compose your "big" solution by combining many "small" solutions.


Write parseLink to extract relevant links from the response header -

const parseLink = (link = "", rel = "") =>
{ const re = /<([^>]+)>; rel="([^"]+)"/g
  let m
  while (m = re.exec(link))
    if (rel === m[2])
      return m[1]
  return undefined
}

const header = 
  `<https://api.github.com/repositories/711976/issues/comments?page=2>; rel="next", <https://api.github.com/repositories/711976/issues/comments?page=169>; rel="last"`

console.log(parseLink(header, "next"))
// https://api.github.com/repositories/711976/issues/comments?page=2

console.log(parseLink(header, "foo"))
// undefined


Use URLSearchParams to modify URL search params -

const href = 
  `https://api.github.com/repositories/711976/issues/comments?foo=bar`

const u = new URL(href)
u.searchParams.set("per_page", 100)


console.log(u.toString())
// https://api.github.com/repositories/711976/issues/comments?foo=bar&per_page=100

u.searchParams.set("foo", "hello")

console.log(u.toString())
// https://api.github.com/repositories/711976/issues/comments?foo=hello&per_page=100


But don't go tangling URL parameter code with fetching code. Use a separate function, such as urlWithParams -

const urlWithParams = (href = "", params = {}) =>
{ const u = new URL(href)
  for (const [k, v] of Object.entries(params))
    u.searchParams.set(k, v)
  return u.toString()
}

const href =
  `https://api.github.com?foo=bar`
  
console.log(urlWithParams(href, { per_page: 100 }))
// https://api.github.com/?foo=bar&per_page=100

console.log(urlWithParams(href, { per_page: 100, page: 3 }))
// https://api.github.com/?foo=bar&per_page=100&page=3

console.log(urlWithParams(href, { per_page: 100, foo: "hello" }))
// https://api.github.com/?foo=hello&per_page=100


We can now write a generic fetchPage which fetches a single page -

const fetchPage = async (href = "") =>
{ const { headers, data } = await axios.get(href)
  const next = parseLink(headers.link, "next")
  return { data, next }
}

const href = 
  `https://api.github.com/repositories/711976/issues/comments?foo=bar`

const result =
  axios.get(urlWithParams(href, { per_page: 100 }))

result.then(console.log)
// { data: [ { ... }, { ... }, ... ]
// , next: `https://api.github.com/repositories/711976/issues/comments?foo=bar&per_page=100&page=2`
// }


And the generic fetchPages which fetches all linked pages -

const fetchPages = async (href = "") =>
  asyncUnfold
    ( async (then, done, { data, next }) =>
        next
          ? then(data, await fetchPage(next))
          : done(data)
    , await fetchPage(href)
    )
    .then(pages => pages.flat(1)) // merge pages into single array

And finally, fetchComments is a specialization of fetchPages and urlWithParams -

const commentsAPI =
  `https://api.github.com/repositories/711976/issues/comments`

const fetchComments = (parameters = {}) =>
  fetchPages(urlWithParams(commentsAPI, parameters))

const result =
  fetchComments({ per_page: 100 })

result.then(console.log, console.error)
// [ { ... }, { ... }, ... ]

The generic asyncUnfold is explained and implemented in here and here. Here is an implementation for reference -

const asyncUnfold = async (run, initState) =>
  run
    ( async (value, nextState) => [ value, ...await asyncUnfold(run, nextState) ]
    , async () => []
    , initState
    )

Verify it all works by expanding and running the snippet in your browser below. NB, the repo in your original question (711976) has over 50 pages of 100 comments each. For purposes of this demo, I am using a smaller repo (261279710) which only has 18 pages. Warning: if you run this demo multiple times, you will probably get rate-limited by the server because it's not using an API token -

const axios =
  { async get (href = "")
    { const r = await fetch(href)
      return Object.assign
        ( { headers: { link: r.headers.get("link") } }
        , { data: await r.json() }
        )
    }
  }

const asyncUnfold = async (run, initState) =>
  run
    ( async (value, nextState) => [ value, ...await asyncUnfold(run, nextState) ]
    , async () => []
    , initState
    )

const parseLink = (link = "", rel = "") =>
{ const re = /<([^>]+)>; rel="([^"]+)"/g
  let m
  while (m = re.exec(link))
    if (rel === m[2])
      return m[1]
  return undefined
}

const urlWithParams = (href = "", params = {}) =>
{ const u = new URL(href)
  for (const [k, v] of Object.entries(params))
    u.searchParams.set(k, v)
  return u.toString()
}
    
const fetchPage = async (href = "") =>
{ const { headers, data } = await axios.get(href)
  const next = parseLink(headers.link, "next")
  return { data, next }
}

const fetchPages = async (href = "") =>
  asyncUnfold
    ( async (then, done, { data, next }) =>
        next
          ? then(data, await fetchPage(next))
          : done(data)
    , await fetchPage(href)
    )
    .then(pages => pages.flat(1))
    
const commentsAPI =
  `https://api.github.com/repositories/261279710/issues/comments`

const fetchComments = (parameters = {}) =>
  fetchPages(urlWithParams(commentsAPI, parameters))

const comments =
  fetchComments({ per_page: 100 })

comments.then(console.log, console.error)
comments.then(r => console.log("total: %d", r.length))
<p>fetching ~20 pages of comments. please wait...</p>

<p><b>Warning:</b> running this demo multiple times may cause the remote server to rate-limit your client.</p>

// [ { ... }, { ... }, { ... }, ... ]
// total: 1840

make it fast!

Notice the program above fetches the pages in serial order. Did you also notice that the server also returns a rel="last" to inform you the last page? With that in mind we could write something that fetches pages in parallel.

Let's imagine a page function that returns the page param from a URL -

const page = (href = "") =>
  // ...

console.log(page("/path/to/repo"))          // 0
console.log(page("/path/to/repo?page=123")) // 123

And a next function which takes a URL and gives us the URL for the next page -

const next = (href = "") => 
  // ...

console.log(next("/path/to/repo"))          // "/path/to/repo?page=1
console.log(page("/path/to/repo?page=123")) // "/path/to/repo?page=124

Using URL and URLSearchParams those should be easy for you to write! Then adjust fetchPage so it returns the last link in the response -

const fetchPage = async (href = "") =>
{ const { headers, data } = await axios.get(href)
  const next = parseLink(headers.link, "next")
  const last = parseLink(headers.link, "last")
  return { data, next, last }
}

Now let's adjust fetchPages to fetch in parallel -

const fetchPages = (href = "") =>
{ const first = 
    await fetchPage(href)

  const pages =
    unfold
      ( (then, done, href) =>
          page(href) < page(first.last)
            ? then(href, next(href))
            : done(href)
      , first.next
      )

  const allData =
    // parallel!
    await Promise.all(pages.flatMap(p => fetchPage(p).then(r => r.data))))

  return first.data.concat(allData)
}

Now your only limitation is concurrent connections 😅

Upvotes: 0

Maxime Helen
Maxime Helen

Reputation: 1412

This is probably due to requestUrl = parameters ? `${requestUrl}?`: requestUrl; line

    const alice = undefined ? 'foo' : 'bar' 
    const bob = {} ? 'foo' : 'bar

    console.log(alice); // 'bar'
    console.log(bob); // 'foo'

This is because Boolean({}) returns true, and an implicit coercion happens with ? operator

Upvotes: 5

Related Questions