Sami Dalouche
Sami Dalouche

Reputation: 644

Using \t in a regex does not seem to work with all tabs

Some lines of a file do not seem to match \t in a regex. Would anyone have an idea why ?

Let's take the example file that you can download from http://download.geonames.org/export/dump/countryInfo.txt.

$ wget http://download.geonames.org/export/dump/countryInfo.txt
--2011-02-03 16:24:08--  http://download.geonames.org/export/dump/countryInfo.txt
Resolving download.geonames.org... 178.63.52.141
Connecting to download.geonames.org|178.63.52.141|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31204 (30K) [text/plain]
Saving to: `countryInfo.txt'

100%[===================================================================================================================================================================================================>] 31,204      75.0K/s   in 0.4s    

2011-02-03 16:24:10 (75.0 KB/s) - `countryInfo.txt' saved [31204/31204]

$ cat countryInfo.txt | grep -E 'AD.AND'
AD  AND 200 AN  Andorra Andorra la Vella    468 84000   EU  .ad EUR Euro    376 AD###   ^(?:AD)*(\d{3})$    ca  3041565 ES,FR   
sdalouche@samxps:/tmp$ cat countryInfo.txt | grep -E 'AD\tAND'
(no result)

output of vi :set list
AD^IAND^I200^IAN^IAndorra^IAndorra la Vella^I468^I84000^IEU^I.ad^IEUR^IEuro^I376^IAD###^I^(?:AD)*(\d{3})$^Ica^I3041565^IES,FR^I$

Upvotes: 7

Views: 5710

Answers (4)

Pithikos
Pithikos

Reputation: 20310

You could just use a literal tab. While being in the terminal press CTRL+V and then press the TAB key. That will make a tab whitespace at the cursor point which can be used in your regular expression.

ls | grep -E "[0-9]<CTRL+V><TAB>]"

This will search for any number from 0 to 9 with a tab character just after it.

Upvotes: 0

user123444555621
user123444555621

Reputation: 153184

Tabs are not part of POSIX regular expressions (the standard for grep). But you can produce a literal tab character like this:

echo -ne "\\t"

So, grepping for a tab works like this:

grep "AD$(echo -ne "\\t")AND"

or

t=$(echo -ne "\\t")
grep "AD${t}AND"

Upvotes: 0

Bryan Oakley
Bryan Oakley

Reputation: 386255

If I read the documentation for grep I see no mention that \t represents tab. Remember, not all regular expression engines are the same.

Upvotes: 0

Andrew Clark
Andrew Clark

Reputation: 208565

Try using the -P option instead of -E:

cat countryInfo.txt | grep -P 'AD\tAND'

This will use Perl style regular expressions, which will catch the \t.

$ echo -e '-\t-' | grep -E '\t'
(no result)
$ echo -e '-\t-' | grep -P '\t'
-   -

Upvotes: 10

Related Questions