AKG
AKG

Reputation: 3286

Regular expression matching (Javascript)

I've got the following string: |Africa||Africans||African Society||Go Africa Go||Mafricano||Go Mafricano Go||West Africa|.

I am trying to write a regular expression that only matches terms that include the word Africa or any deriative of it (meaning yes to all terms above except for |Mafricano| and |Go Mafricano Go|. Each term is enclosed between two |.

Right now I've come up with: /\|[^\|]*africa[^\|]*\|/gi, which says:


  1. \| Match |

  1. [^\|]* Match zero to unlimited instances of any character except |

  1. africa Match africa literally

  1. [^\|]* Match zero to unlimited instances of any character except |

  1. \| Match |

I've tried inserting ((?:\s)|(?!\w)) to make it /\|[^\|]*((?:\s)|(?!\w))africa[^\|]*\|/gi. Although it succeeds in excluding |Mafricano| and |Go Mafricano Go|, it also excludes all other entries except for |West Africa| and |Go Africa Go|. So that is good but it needs to include all single word Africa and its derived forms too.

Can anybody help me?

Upvotes: 2

Views: 72

Answers (2)

Amit Joki
Amit Joki

Reputation: 59232

You can use this regex

[^|]*\bAfrica[a-z]*\b[^|]*

DEMO

var str = "|Africa||Africans||African Society||Go Africa Go||Mafricano||Go Mafricano Go||West Africa|";
var arr = str.match(/[^|]*\bAfrica[a-z]*\b[^|]*/g);
console.log(arr); // ["Africa", "Africans", "African Society", "Go Africa Go", "West Africa"] 

Upvotes: 4

Avinash Raj
Avinash Raj

Reputation: 174696

I think you want something like this,

\|(?:(?!Mafrica|\|).)*?africa(?:(?!Mafrica|\|).)*?\|

DEMO

> "|Africa||Africans||African Society||Go Africa Go||Mafricano||Go Mafricano Go||West Africa|".match(/\|(?:(?!Mafrica|\|).)*?africa(?:(?!Mafrica|\|).)*?\|/gi);
[ '|Africa|',
  '|Africans|',
  '|African Society|',
  '|Go Africa Go|',
  '|West Africa|' ]

Don't forget to turn on the i modifier to do a case insensitive match.

Explanation:

\|                       '|'
(?:                      group, but do not capture (0 or more
                         times):
  (?!                      look ahead to see if there is not:
    Mafrica                  'Mafrica'
   |                        OR
    \|                       '|'
  )                        end of look-ahead
  .                        any character except \n
)*?                      end of grouping
africa                   'africa'
(?:                      group, but do not capture (0 or more
                         times):
  (?!                      look ahead to see if there is not:
    Mafrica                  'Mafrica'
   |                        OR
    \|                       '|'
  )                        end of look-ahead
  .                        any character except \n
)*?                      end of grouping
\|                       '|'

Upvotes: 1

Related Questions