Linux ubuntu Awk. Find the sentences containg 2 same words placed near themselves

Question

I want to print all the sentences that contains two same words next to each other. The sentence is ending with . or ? or !.

For the input:

word ja ba word. Na Na word wdd? Nothing kkk
ok ok! word no this no word. ok ok. notok!

output should be:

Na Na word wdd?

Nothing kkk
ok ok!

ok ok.

This is my code so far:

 #!/bin/bash
if [ $# -eq 0 ]
 then
echo "No arguments"
 fi

if [[ -f $1 ]] #if its file
then

cat $1 | awk '{
for (i=1;i<=NF;i++)         
  {

}}'

fi

I dont know how to separate full sentences with AWK. I can't use multpile file separators (! its important). If I separate them, how to check every word inside? I need to use AWK.

this is my newest idea:

cat $1 | awk  '{
 for (i=1;i<=NF;i++)         
  {
   a=0;
    if ($i ~ "\?$" || $i ~ "\!$" || $i ~ "\.$")          
    {

  #print $i;
      k='';

    for(j=$i; j!=$a; j--);
    {
      if( $j == $k)
        #print whole sentence

       $k=$j;

    }

    }
}}'

I found the words ending with ?/./!, then check all the previous words before the last sentence

Aaron · Accepted Answer

grep is enough to do so :

grep -Pzo "[^.?!]*\b(\w+) \1[^.?!]*"

Test:

$ echo '''word ja ba word. Na Na word wdd? Nothing kkk  
ok ok! word no this no word. ok ok. notok!''' | grep -Pzo "[^.?!]*\b(\w+) \1[^.?!]*"  
Na Na word wdd  
Nothing kkk  
ok ok  
ok ok

Explanation :

the -o flag makes grep only return the matched result, rather than the line it appears in
the -P flag makes grep use PCRE regex
the -z flag suppress newline at the end of line, substituting it for nul character. That is, grep knows where end of line is, but sees the input as one big line.
[^.?!]* matches the start of the sentence : it will match as much characters as it can, but no sentence terminators (.?!)
\b(\w+) matches word characters, and groups them in the first group of the regular expression. The word boundary makes sure we do not only match the end of a word (thanks 123 !).
\1 references this first group, so we must have two identical words separated by a space
[^.?!]* matches the end of the sentence

Linux ubuntu Awk. Find the sentences containg 2 same words placed near themselves

Answers (2)

Related Questions