Reputation: 616
I want to print all the sentences that contains two same words next to each other. The sentence is ending with . or ? or !.
For the input:
word ja ba word. Na Na word wdd? Nothing kkk
ok ok! word no this no word. ok ok. notok!
output should be:
Na Na word wdd?
Nothing kkk
ok ok!
ok ok.
This is my code so far:
#!/bin/bash
if [ $# -eq 0 ]
then
echo "No arguments"
fi
if [[ -f $1 ]] #if its file
then
cat $1 | awk '{
for (i=1;i<=NF;i++)
{
}}'
fi
I dont know how to separate full sentences with AWK. I can't use multpile file separators (! its important). If I separate them, how to check every word inside? I need to use AWK.
this is my newest idea:
cat $1 | awk '{
for (i=1;i<=NF;i++)
{
a=0;
if ($i ~ "\?$" || $i ~ "\!$" || $i ~ "\.$")
{
#print $i;
k='';
for(j=$i; j!=$a; j--);
{
if( $j == $k)
#print whole sentence
$k=$j;
}
}
}}'
I found the words ending with ?/./!, then check all the previous words before the last sentence
Upvotes: 0
Views: 391
Reputation: 24812
grep
is enough to do so :
grep -Pzo "[^.?!]*\b(\w+) \1[^.?!]*"
Test:
$ echo '''word ja ba word. Na Na word wdd? Nothing kkk
ok ok! word no this no word. ok ok. notok!''' | grep -Pzo "[^.?!]*\b(\w+) \1[^.?!]*"
Na Na word wdd
Nothing kkk
ok ok
ok ok
Explanation :
-o
flag makes grep
only return the matched result, rather than the line it appears in-P
flag makes grep
use PCRE regex-z
flag suppress newline at the end of line, substituting it for nul character. That is, grep knows where end of line is, but sees the input as one big line.[^.?!]*
matches the start of the sentence : it will match as much characters as it can, but no sentence terminators (.?!)\b(\w+)
matches word characters, and groups them in the first group of the regular expression. The word boundary makes sure we do not only match the end of a word (thanks 123 !).\1
references this first group, so we must have two identical words separated by a space[^.?!]*
matches the end of the sentenceUpvotes: 4
Reputation: 67507
with gawk
$ awk -v RS='[!?.] +' '{for(i=1;i<NF;i++) if($i==$(i+1)) print $0 RT "\n"}' file
Na Na word wdd?
Nothing kkk
ok ok!
ok ok.
set the records ending with [!?.]
and optional space. Iterate over words in the sentence for repeats, print the sentence with matched record terminator and new line for spacing between sentences.
Here is the same script with the here document
awk -v RS='[!?.] +' '{for(i=1;i<NF;i++) if($i==$(i+1)) print $0 RT "\n"}' << EOF
> word ja ba word. Na Na word wdd? Nothing kkk
> ok ok! word no this no word. ok ok. notok!
> EOF
should give you
Na Na word wdd?
Nothing kkk
ok ok!
ok ok.
Upvotes: 2