Sri
Sri

Reputation: 371

How to grep a string ignoring punctuation?

I have a >100MB text file of company names and their code like below:

...  
...  
ABC Sys, INC.:0001111111:  
ABC Systems INC:0001111112:  
...  
...

I have an input company name abc sys inc (without the comma after Sys and period after Inc).

I want to grep the line having company name ABC Sys, Inc. using search string abc sys inc.

grep -i "abc sys inc" my_list_file.txt

returns no lines, whereas I want it to return the first line.

grep -i "abc sys" my_list_file.txt

returns both lines.

Upvotes: 1

Views: 847

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627199

You may replace each space with a pattern that would match 1 or more punctuation/whitespace characters. So, you may use either [[:space:][:punct:]]\+ / [[:space:][:punct:]]\{1,\} or [^[:alnum:]]\+ / [^[:alnum:]]\{1,\}. If you use a POSIX ERE expression, the \+ or \{1,\} can be written as a mere +.

search="abc sys inc";
grep -E -i "${search// /[^[:alnum:]]+}" file > outfile

See the grep demo:

s='...  
ABC Sys, INC.:0001111111:  
ABC Systems INC:0001111112:  
...  '

search="abc sys inc";
grep -E -i "${search// /[^[:alnum:]]+}" <<< "$s"  

Output:

ABC Sys, INC.:0001111111:  

Upvotes: 1

Related Questions