asdasd
asdasd

Reputation: 7200

How to match regex where the pattern order is not known

I have an HTML file with divs having class name. I need to find divs with the exact set of given classes. But the problem is I don't know the order in which classes are written inside div. For example:

Given class list: 1, 2, 3

HTML (simplified)

<div id="test" class="1 2 3"></div> #A match! since it contains classes 1 2 3
<div id="test" class="3 2 1"></div> #A match! since it contains classes 1 2 3
<div id="test" class="2 1 3"></div> #A match! since it contains classes 1 2 3
<div id="test" class="1 2 3 4"></div> #not a match since it contains other classes than 1, 2, 3
<div id="test" class="1 2 3 5 6"></div> #not a match since it contains other classes than 1, 2, 3

Any suggestion is greatly appreciated!

Edit: Html text, like in my example, doesn't need to be valid. It can contain multiple div with the same id or unclosed div.

Upvotes: 0

Views: 55

Answers (2)

ic3b3rg
ic3b3rg

Reputation: 14927

A regex wouldn't be the right approach. You want to match div by their set of classes - i.e. that the intersection of classes from a given input set matches the set of classes for the div:

const findDivsWithClasses = classes => {
  const classesSet = new Set(classes.split(' '));
  return [...document.getElementsByTagName('div')].filter(div =>
    div.classList.length === classesSet.size && [...div.classList].every(className => classesSet.has(className)));
}

console.log(findDivsWithClasses('1 2 3'));
<div id="test" class="1 2 3"></div> #A match! since it contains classes 1 2 3
<div id="test" class="3 2 1"></div> #A match! since it contains classes 1 2 3
<div id="test" class="2 1 3"></div> #A match! since it contains classes 1 2 3
<div id="test" class="1 2 3 4"></div> #not a match since it contains other classes than 1, 2, 3
<div id="test" class="1 2 3 5 6"></div> #not a match since it contains other classes than 1, 2, 3

Upvotes: 0

CertainPerformance
CertainPerformance

Reputation: 370679

Note that duplicate IDs in a single document are invalid HTML. If you have control over the source, best to use classes instead.

Rather than regular expressions (which are often clumsy and inelegant when dealing with HTML parsing), consider using DOMParser and then check the classList of each element:

const input = `
<div id="test" class="1 2 3">m1</div> #A match! since it contains classes 1 2 3
<div id="test" class="3 2 1">m2</div> #A match! since it contains classes 1 2 3
<div id="test" class="2 1 3">m3</div> #A match! since it contains classes 1 2 3
<div id="test" class="1 2 3 4">m4</div> #not a match since it contains other classes than 1, 2, 3
<div id="test" class="1 2 3 5 6">m5</div> #not a match since it contains other classes than 1, 2, 3
`;
const doc = new DOMParser().parseFromString(input, 'text/html');
const matchingDivs = Array.prototype.filter.call(
  doc.querySelectorAll('div'),
  ({ classList }) => (
    classList.contains('1') &&
    classList.contains('2') &&
    classList.contains('3') &&
    classList.length === 3
  )
);
console.log(matchingDivs);

Upvotes: 1

Related Questions