V Anon
V Anon

Reputation: 543

regex - combining multiple conditions within grep

I am having hard time combining two conditions within grep.

My first condition is that 'GT' is in the middle of the string.

The strings are composed of 12 characters, so GT spans position 5 to 6.

My second condition is that no 'C' occurs before the appearance of the middle positioned 'GT'.

So far, I have

grep -E '^.{5}GT' *.txt | grep -E '^[^C]*GT'

but this would output invalid strings such as

GTCTGGTGAGTT

I believe the code is interpreting the first occurrence of GT as the second condition and allowing it to be outputted.

How can I make improvements?

Upvotes: 1

Views: 489

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

Using a negated character class [^C]* will match any character so for example also 5 whitespaces and would for example also match GT

If the possible values are GTAC, you could repeat either G, T or A 5 times, then match GT followed by matching GTAC 5 times until the end of the string:

^[GTA]{5}GT[GTCA]{5}$

Regex demo

for example:

grep -E '^[GTA]{5}GT[GTCA]{5}$' *.txt

Upvotes: 2

Thomas
Thomas

Reputation: 181715

So you want:

  • exactly 5 characters none of which are C: [^C]{5}
  • GT
  • any 5 characters: .{5}

Putting it together (anchored between ^...$):

grep -E '^[^C]{5}GT.{5}$' *.txt

Upvotes: 2

Related Questions