regexnewbie
regexnewbie

Reputation: 11

How to use regex to get only a single tab, instead of 2 and more? (grep repetition)

I want to make a bash script in linux, using terminal commands.

I have the following text file:

[tabkey]text1
text2
[tabkey][tabkey]text3
[end of file]

Each of the above is in its own line, so there are 3 lines in total. The first has 1 tab, the 3rd has 2 tabs at the start.

If I use

grep  $'\t'

I get all lines with tabs, but not highlighted ofc. So I ended up using

grep $'\t'".*"

to get text1 and text3. However, how can I get only 1 \t?

I want to get exclusively text1, or exclusively text3. Not both. I ask this because I can't grasp my head around repetition, {N} to repeat the previous command doesn't seem to work even for letters, yet I need it for the tab character.

Upvotes: 1

Views: 355

Answers (2)

regexnewbie
regexnewbie

Reputation: 11

The answer above should work, but it seems kinda bloated.

Except the perl engine part, that is good. I ended up using extended regex with some help, because it seems basic/default regex isn't meant for this (hence looks bloated)

grep -E $'^\t{N}[^\t].*'

N=1 for text1, N=2 for text2. ^\t means start of line is tab, {N} is repetition of previous command, and while this works by itself, you want the further tabs to not be included (e.g. N=2 will include 3 tabs), so you want to exclude further tabs by [^\t]

Edit: Would flag this as an answer, but this site says I have to wait 23 hours lol

Upvotes: 0

Kind Stranger
Kind Stranger

Reputation: 1761

. matches all characters excluding newline \n so this would include tab \t. Assuming all your text1, text2, text3, etc is comprised only of the characters in the ranges a-z, A-Z, 0-9 and _ (underscore) you can use \w to match only these.

grep $'\t''\w\+'

+ matches at least one character which you may find preferable to * in your pattern as it won't match blank lines that start with a tab

If you want match more than just this, look at using something like this pattern which will match a-z, 0-9 and - (minus sign):

grep $'\t''[a-z0-9-]\+' test.txt
        -text1-
                text3

You will also need to achor your pattern to the start of the line using ^, otherwise, your grep match can start anywhere (for example at the second tab)

grep '^'$'\t''\w\+'
        -text1-

Then, matching exactly 2 tab characters can be done like this:

grep '^'$'\t\{2\}''[a-z0-9-]\+'
                text3

With the perl engine (-P with grep), the escaping is a little clearer:

grep -P '^'$'\t{2}''[a-z0-9-]+'

Upvotes: 2

Related Questions