Reputation: 543
I am having hard time combining two conditions within grep.
My first condition is that 'GT' is in the middle of the string.
The strings are composed of 12 characters, so GT spans position 5 to 6.
My second condition is that no 'C' occurs before the appearance of the middle positioned 'GT'.
So far, I have
grep -E '^.{5}GT' *.txt | grep -E '^[^C]*GT'
but this would output invalid strings such as
GTCTGGTGAGTT
I believe the code is interpreting the first occurrence of GT as the second condition and allowing it to be outputted.
How can I make improvements?
Upvotes: 1
Views: 489
Reputation: 163207
Using a negated character class [^C]*
will match any character so for example also 5 whitespaces and would for example also match GT
If the possible values are GTAC, you could repeat either G
, T
or A
5 times, then match GT followed by matching GTAC 5 times until the end of the string:
^[GTA]{5}GT[GTCA]{5}$
for example:
grep -E '^[GTA]{5}GT[GTCA]{5}$' *.txt
Upvotes: 2
Reputation: 181715
So you want:
C
: [^C]{5}
GT
.{5}
Putting it together (anchored between ^...$
):
grep -E '^[^C]{5}GT.{5}$' *.txt
Upvotes: 2