Vadym Chornyi
Vadym Chornyi

Reputation: 33

How to optimize the search in an array with more than 50 thousand elements?

I have an array of strings with approximately 50,000 elements.

export const companies = [
  "000014",
  "000016",
  "000017",
  "000019",
  "000020",
  "000021",
  "000023",
  "000025",
  ...
]

These are the names of companies for which I display certain pages. I made a middleware in which I run a loop and walk through this large array.

import { NextResponse, type NextRequest } from "next/server";
import { companies } from "./assets/companies";

export async function middleware(req: NextRequest) {
  const { pathname } = req.nextUrl;

  // cycle for comparing current URL with companies
  await for (let i = 0; i < companies.length; i++) {
    if (pathname.startsWith(`/${companies[i]}`))
        return NextResponse.redirect(new URL("/", req.url)); // redirect to main page if companies compare with current pathname
  }
}

It takes some time, how can it be optimized? There was an idea to divide the array into chunks, but this is also not a very good option.

Upvotes: 0

Views: 241

Answers (2)

SimonL
SimonL

Reputation: 1838

If for some reason you can't just do map / object lookups, like @Aurast suggested (for example, you need to do free text searching to find closest matching name), then one very simple approach is to group companies so you can divide the list into smaller chunks, and search only one chunk. For example, divide by initial letter:

export const companies = {
    "a": ["aardvarks, inc", "AppendeciteCorp", ],
    "b": ["boulevard bricklayers", "bandits and bakers", "brompton fan club"],
}

Then you can look up by just first letter, in order to get a much smaller array. This would be very simple to implement.


The other approach is to take this to its logical conclusion, and recursively group by first letter in a data structure called a trie or prefix tree. This allows you to very quickly search for stuff even if it's not an exact match.

a prefix tree structure showing "tea", "ted", "ten" and "inn"

Upvotes: 0

Aurast
Aurast

Reputation: 3678

Assuming pathname looks something like /000014/abc/xyz then you can get rid of the array iteration entirely. Something like this:

import { NextResponse, type NextRequest } from "next/server";
import { companies } from "./assets/companies";

const companiesSet = new Set(companies);

export async function middleware(req: NextRequest) {
  const { pathname } = req.nextUrl;
  
  // Ideally you could get this from req.params.companyId instead, but whether that exists or not would depend on the routing code, which you haven't shown.
  const companyId = req.pathname.match(/\/([0-9]+)\//)?.[1];

  if (companiesSet.has(companyId)) {
    return NextResponse.redirect(new URL("/", req.url)); // redirect to main page if companies compare with current pathname
  }
}

That being said, 50,000 elements isn't really that large, all things considered, and that code shouldn't have been particularly slow. The unnecessary await and the string building inside of the loop may have been an issue.

Upvotes: 2

Related Questions