RemcoE33
RemcoE33

Reputation: 1620

Array of filtered axios results from paginated API is empty

In my code below I get an empty array on my console.log(response) but the console.log(filterdIds) inside the getIds function is showing my desired data. I think my resolve is not right.

Note that I run do..while once for testing. The API is paged. If the records are from yesterday it will keep going, if not then the do..while is stopped.

Can somebody point me to the right direction?

const axios = require("axios");

function getToken() {
    // Get the token
}

function getIds(jwt) {
    return new Promise((resolve) => {
        let pageNumber = 1;
        const filterdIds = [];

        const config = {
            //Config stuff
        };

        do {
            axios(config)
                .then((response) => {
                    response.forEach(element => {
                        //Some logic, if true then:
                        filterdIds.push(element.id);
                        console.log(filterdIds);
                    });
                })
                .catch(error => {
                    console.log(error);
                });
        } while (pageNumber != 1)
        resolve(filterdIds);
    });
}


getToken()
    .then(token => {
        return token;
    })
    .then(jwt => {
        return getIds(jwt);
    })
    .then(response => {
        console.log(response);
    })
    .catch(error => {
        console.log(error);
    });

I'm also not sure where to put the reject inside the getIds function because of the do..while.

Upvotes: 2

Views: 948

Answers (2)

ggorlen
ggorlen

Reputation: 57155

The fundamental problem is that resolve(filterdIds); runs synchronously before the requests fire, so it's guaranteed to be empty.

Promise.all or Promise.allSettled can help if you know how many pages you want up front (or if you're using a chunk size to make multiple requests--more on that later). These methods run in parallel. Here's a runnable proof-of-concept example:

const pages = 10; // some page value you're using to run your loop

axios
  .get("https://httpbin.org") // some initial request like getToken
  .then(response => // response has the token, ignored for simplicity
    Promise.all(
      Array(pages).fill().map((_, i) => // make an array of request promisess
        axios.get(`https://jsonplaceholder.typicode.com/comments?postId=${i + 1}`)
      )
    )
  )
  .then(responses => {
    // perform your filter/reduce on the response data
    const results = responses.flatMap(response =>
      response.data
        .filter(e => e.id % 2 === 0) // some silly filter
        .map(({id, name}) => ({id, name}))
    );
    
    // use the results
    console.log(results);
  })
  .catch(err => console.error(err))
;
<script src="https://unpkg.com/axios/dist/axios.min.js"></script>

The network tab shows the requests happening in parallel:

parallel request waterfall

If the number of pages is unknown and you intend to fire requests one at a time until your API informs you of the end of the pages, a sequential loop is slow but can be used. Async/await is cleaner for this strategy:

(async () => {
  // like getToken; should handle err
  const tokenStub = await axios.get("https://httpbin.org");
  
  const results = [];
  
  // page += 10 to make the snippet run faster; you'd probably use page++
  for (let page = 1;; page += 10) {
    try {
      const url = `https://jsonplaceholder.typicode.com/comments?postId=${page}`;
      const response = await axios.get(url);
      
      // check whatever condition your API sends to tell you no more pages
      if (response.data.length === 0) { 
        break;
      }
      
      for (const comment of response.data) {
        if (comment.id % 2 === 0) { // some silly filter
          const {name, id} = comment;
          results.push({name, id});
        }
      }
    }
    catch (err) { // hit the end of the pages or some other error
      break;
    }
  }
  
  // use the results
  console.log(results);
})();
<script src="https://unpkg.com/axios/dist/axios.min.js"></script>

Here's the sequential request waterfall:

sequential request waterfall

A task queue or chunked loop can be used if you want to increase parallelization. p-limit is a useful library for limiting your Promise.all work. A chunked loop would combine the two techniques to request n records at a time and check each result in the chunk for the termination condition. Here's a simple example that strips out the filtering operation, which is sort of incidental to the asynchronous request issue and can be done synchronously after the responses arrive:

(async () => {  
  const results = [];
  const chunk = 5;
  
  for (let page = 1;; page += chunk) {
    try {
      const responses = await Promise.all(
        Array(chunk).fill().map((_, i) => 
          axios.get(`https://jsonplaceholder.typicode.com/comments?postId=${page + i}`)
        )
      );
      
      for (const response of responses) {      
        for (const comment of response.data) {
          const {name, id} = comment;
          results.push({name, id});
        }
      }
      
      // check end condition
      if (responses.some(e => e.data.length === 0)) { 
        break;
      }
    }
    catch (err) {
      break;
    }
  }
  
  // use the results
  console.log(results);
})();
<script src="https://unpkg.com/axios/dist/axios.min.js"></script>

chunked request waterfall

(above image is an except of the 100 requests, but the chunk size of 5 at once is visible)

Note that these snippets are proofs-of-concept and could stand to be less indiscriminate with catching errors, ensure all throws are caught, etc. When breaking it into sub-functions, make sure to .then and await all promises in the caller--don't try to turn it into synchronous code.

See also

Upvotes: 3

knappsacks
knappsacks

Reputation: 106

To take a step back and think about why you ran into this issue, we have to think about how synchronous and asynchronous javascript code works together. Your synchronous getIds function is going to run to completion, stepping through each line until it gets to the end.

The axios function invocation is returning a Promise, which is an object that represents some future fulfillment or rejection value. That Promise isn't going to resolve until the next cycle of the event loop (at the earliest), and your code is telling it to do some stuff when that pending value is returned (which is the callback in the .then() method).

But your main getIds function isn't going to wait around... it invokes the axios function, gives the Promise that is returned something to do in the future, and keeps going, moving past the do/while loop and onto the resolve method which returns a value from the Promise you created at the beginning of the function... but the axios Promise hasn't resolved by that point and therefore filterIds hasn't been populated.

When you moved the resolve method for the promise you're creating into the callback that the axios resolved Promise will invoke, it started working because now your Promise waits for axios to resolve before resolving itself.

Hopefully that sheds some light on what you can do to get your multi-page goal to work.


I couldn't help thinking there was a cleaner way to allow you to fetch multiple pages at once, and then recursively keep fetching if the last page indicated there were additional pages to fetch. You may still need to add some additional logic to filter out any pages that you batch fetch that don't meet whatever criteria you're looking for, but this should get you most of the way:

async function getIds(startingPage, pages) {
    const pagePromises = Array(pages).fill(null).map((_, index) => {
        const page = startingPage + index;
        // set the page however you do it with axios query params
        config.page = page;
        return axios(config);
    });

    // get the last page you attempted, and if it doesn't meet whatever
    // criteria you have to finish the query, submit another batch query
    const lastPage = await pagePromises[pagePromises.length - 1];
    
    // the result from getIds is an array of ids, so we recursively get the rest of the pages here
    // and have a single level array of ids (or an empty array if there were no more pages to fetch)
    const additionalIds = !lastPage.done ? [] : await getIds(startingPage + pages, pages);

    // now we wait for all page queries to resolve and extract the ids
    const resolvedPages = await Promise.all(pagePromises);
    const resolvedIds = [].concat(...resolvedPages).map(elem => elem.id);

    // and finally merge the ids fetched in this methods invocation, with any fetched recursively
    return [...resolvedIds, ...additionalIds];
}

Upvotes: 1

Related Questions