Reputation: 11
I know how to count a series of keywords in Excel. I use this formula:
=SUMPRODUCT(--ISNUMBER(SEARCH($CE$2:$CE$43,(G2:AP2))))
However, what would be the Excel formula if I want to count the number of keywords that exist only within +/-3 words around "risk" in the selected rows?
Consider this sentence: "Political uncertainty generates economic risk which stagnates economic activities." If my keywords are political
, uncertainty
, stagnates
, and economic
, the total count of keywords within +/- 3 words around "risk" will be 4, i.e., uncertainty
, stagnates
, and economic
. "economic" appears twice in the sentence. political
will be excluded since it is out of range.
Upvotes: 1
Views: 107
Reputation: 75840
You could try:
Formula in E1
:
=SUM(--ISNUMBER(MATCH(FILTERXML("<t><s>"&SUBSTITUTE("a a a "&A1," ","</s><s>")&"</s></t>","//s[following::s[4]='risk']/following::*[position()<8]")&"*",C1:C4,0)))
This would in order:
"<t><s>"&SUBSTITUTE("a a a "&A1," ","</s><s>")&"</s></t>"
- Create a valid xml-string to process with xpath;"//s[following::s[4]='risk']/following::*[position()<8]"
- A valid xpath 1.0 expression to retrieve only upto seven nodes with risk
in the middle. In more detail: //s[following::s[4]='risk']
will get the node where the following sibling on 4 indices to it's right will equal risk
(case-sensitive), we then take all following siblings from that position with /following::*
where we limit the returned nodes to a max of seven with [position()<8]
. Now it would also make sense why we concatenate a a a
with the string from A1
since risk
could appear at the start of the string or less then three words in;MATCH()
- Will then check if any of the returned nodes start with any of the words in C1:C4
using a wildcard. This is to prevent possible punctuation to avoid matching;SUM()
and ISNUMBER()
prepended with double unary will lastly make a summation of hits.Note: The answer is not 3 but 4! Since 'economic' is to be counted twice.
Upvotes: 1
Reputation: 36840
You may try the following formula if you have Microsoft-365 with most recent release.
=LET(x,TOCOL(TEXTSPLIT(A1," ")),y,MATCH("risk",x,0),z,INDEX(x,SEQUENCE(y+2,1,y-3)),COUNT(XMATCH(D1:D4,z,0)))
Upvotes: 0