Reputation: 2182
For:
echo "the quick brown fox" | grep -Po '[a-z]+ [a-z]+'
I get:
the quick
brown fox
but I wanted:
the quick
quick brown
brown fox
How?
Upvotes: 4
Views: 853
Reputation: 11
Simply reusing the original solution to get the markov chain:
echo "the quick brown fox" | grep -Po '[a-z]+ [a-z]+'
echo "the quick brown fox" | sed 's/^[a-z]* //' | grep -Po '[a-z]+ [a-z]+'
The second line (namely sed) removes the first word of the input. Therefore, rest of the command generates the missing pairs.
The same approach could also be generalized using sed's ability to run loops:
echo pattern1pattern2 | sed ':start;s/\(pattern1\)\(pattern2\)/<\1|\2>\2/;t start' | grep -o '<[^>]*>' | tr -d '<>|'
This solution will work with partially overlapping patterns where pattern2
can be overlapped by next match. It assumes <>|
to be reserved auxiliary characters. Furthermore it assumes that the pattern1pattern2
regex cannot match any string that is matched by pattern2
alone.
The sed substitues pattern1pattern2
with <pattern1|pattern2>pattern2
and repeats this substitution as long as any matches are found (the branching t
command allows matching previously substituted strings, unlike the g
option). I.e., in every iteration, one <pattern1|pattern2>
group is left behind indicating our matches, while an instance of pattern2
can still be matched within next match. Finally, we pick the groups using the original approach and strip the auxiliary marks.
Upvotes: 1
Reputation: 4924
with awk
:
awk '{for(i=1;i<NF;i++) print $i,$(i+1)}' <<<"the quick brown fox"
update: with python:
#!/usr/bin/python3.5
import re
s="the quick brown fox"
matches = re.finditer(r'(?=(\b[a-z]+\b \b[a-z]+\b))',s)
ans=[i.group(1) for i in matches]
print(ans) #or not print
for i in ans:
print(i)
output:
['the quick', 'quick brown', 'brown fox']
the quick
quick brown
brown fox
Upvotes: 2
Reputation: 1517
another awk:
awk '{print $1,$2 RS $2,$3 RS $3,$4}' <<<"the quick brown fox"
the quick
quick brown
brown fox
Upvotes: 0