Hichigaya Hachiman
Hichigaya Hachiman

Reputation: 307

Finding number range with grep

I have a database in this format:

username:something:UID:something:name:home_folder

Now I want to see which users have a UID ranging from 1000-5000. This is what what I tried to do:

ypcat passwd | grep '^.*:.*:[1-5][0-9]\{2\}:'

My thinking is this: I go to the third column and find numbers that start with a number from 1-5, the next number can be any number - range [0-9] and that range repeats itself 2 more times making it a 4 digit number. In other words it would be something like [1-5][0-9][0-9][0-9].

My output, however, lists even UID's that are greater than 5000. What am I doing wrong?

Also, I realize the code I wrote could potentially lists numbers up to 5999. How can I make the numbers 1000-5000?

EDIT: I'm intentionally not using awk since I want to understand what I'm doing wrong with grep.

Upvotes: 1

Views: 4197

Answers (2)

Gordon Davisson
Gordon Davisson

Reputation: 125728

There are several problems with your regex:

  • As Sundeep pointed out in a comment, ^.*:.*: will match two or more columns, because the .* parts can match field delimiters (":") as well as field contents. To fix this, use ^[^:]*:[^:]*: (or, equivalently, ^\([^:]:\)\{2\}); see the notes on bracket expressions and basic vs extended RE syntax below)
  • [0-9]\{2\} will match exactly two digits, not three
  • As you realized, it matches numbers starting with "5" followed by digits other than "0"

As a result of these problems, the pattern ^.*:.*:[1-5][0-9]\{2\}: will match any record with a UID or GID in the range 100-599.

To do it correctly with grep, use grep -E '^([^:]*:){2}([1-4][0-9]{3}|5000):' (again, see Sundeep's comments).

[Added in edit:] Concerning bracket expressions and what ^ means in them, here's the relevant section of the re_format man page:

A bracket expression is a list of characters enclosed in '[]'. It normally matches any single character from the list (but see below). If the list begins with '^', it matches any single character (but see below) not from the rest of the list. If two characters in the list are separated by '-', this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g. '[0-9]' in ASCII matches any decimal digit.

(bracket expressions can also contain other things, like character classes and equivalence classes, and there are all sorts of special rules about things like how to include characters like "^", "-", "[", or "]" as part of a character list, rather than negating, indicating a range, class, or end of the expression, etc. It's all rather messy, actually.)

Concerning basic vs. extended RE syntax: grep -E uses the "extended" syntax, which is just different enough to mess you up. The relevant differences here are that in a basic RE, the characters "(){}" are treated as literal characters unless escaped (if escaped, they're treated as RE syntax indicating grouping and repetition); in an extended RE, this is reversed: they're treated as RE syntax unless escaped (if escaped, they're treated as literal characters).

That's why I suggest ^\([^:]:\)\{2\} in the first bullet point, but then actually use ^([^:]*:){2} in the proposed solution -- the first is basic syntax, the second is extended.

The other relevant difference -- and the reason I switched to extended for the actual solution -- is that only extended RE allows | to indicate alternatives, as in this|that|theother (which matches "this" or "that" or "theother"). I need this capability to match a 4-digit number starting with 1-4 or the specific number 5000 ([1-4][0-9]{3}|5000). There's simply no way to do this in a basic RE, so grep -E and the extended syntax are required here.

(There are also many other RE variants, such as Perl-compatible RE (PCRE). When using regular expressions, always be sure to know which variant your regex tool uses, so you don't use syntax it doesn't understand.)

Upvotes: 4

P....
P....

Reputation: 18351

ypcat passwd |awk -F: '$3>1000 && $3 <5000{print $1}'

awk here can go the task in a simple manner. Here we made ":" as the delimiter between the fields and put the condition that third field should be greater than 1000 and less then 5000. If this condition meets print first field.

Upvotes: 2

Related Questions