Auxive
Auxive

Reputation: 63

Using sed to match text in the 5th field

So, I am trying to look for certain words in the 5th field of /etc/passwd. For example:

jonesc:x:1053:1001:Cathy Jones:/export/home/jonesc:/bin/ksh
smiths:x:1049:1000:Sue Williams:/export/home/smiths:/bin/csh
smitha:x:1050:1001:Amy Smith:/export/home/smitha:/bin/bash

Lets say I am looking for the word 'Smith'? How would I look for it ONLY in the 5th field that contains the names, as opposed to looking through the entire line?

I can easily do this with awk, but I am asked to do this with sed instead.

What I'm asked to do is to output matches from /etc/passwd that contain Smith or Jones in the 5th field to a file called smith_jones.txt.

I have no problem with writing output to file with sed, I am just stuck with how I am supposed to look for only in the 5th field. Awk would use $5, but I cannot find something similar with sed.

Not looking for a complete answer being handed to me, but rather a push in the right direction.

Upvotes: 1

Views: 2360

Answers (3)

Jay jargot
Jay jargot

Reputation: 2868

Give a try to this:

sed -n ":1
/^[^:]*:[^:]*:[^:]*:[^:]*:[^:]*Smith[^:]*:.*$/ {p
n
b1}
/^[^:]*:[^:]*:[^:]*:[^:]*:[^:]*Jones[^:]*:.*$/{p}"

-n instructs sed to not print anything

:1 defines a label

/^[^:]*:[^:]*:[^:]*:[^:]*:[^:]*Smith[^:]*:.*$/ regex matches any string that contains Smith in the 5th field, where fields are separated with :.

p is a command that prints the current line.

n is a command that loads the next line into the buffer.

b1 goto label 1

sed reads the file one line at a time. The current line is stored into the buffer. IfSmith is found in the 5th field the line is printed and the next line is stored into the buffer and it goes to label 1. Otherwise, if Jones is found in the 5th field then the line in the buffer is printed.

The test:

$ sed -n ":1
/^[^:]*:[^:]*:[^:]*:[^:]*:[^:]*Smith[^:]*:.*$/ {p
n
b1}
/^[^:]*:[^:]*:[^:]*:[^:]*:[^:]*Jones[^:]*:.*$/{p}" /etc/passwd >> smith_jones.txt

$ cat smith_jones.txt
jonesc:x:1053:1001:Cathy Jones:/export/home/jonesc:/bin/ksh
smitha:x:1050:1001:Amy Smith:/export/home/smitha:/bin/bash

Upvotes: 0

Andreas Louv
Andreas Louv

Reputation: 47119

Awk would be the right tool for the job:

awk '$5 ~ /smith|jones/{print}' /etc/passwd > output.txt

But since you are asking for a sed solution then you can use something like this:

sed -n '/[^:]*:[^:]*:[^:]*:[^:]*:\(smith\|jones\)/p' /etc/passwd

Where each [^:]* will match everything but : zero or more times.

You can also repeat a previous pattern with the range meta sequence: \{x,y\}:

sed -n '/\([^:]*:\)\{4\}\(smith\|jones\)/p' /etc/passwd

As you can see this will help you simplify your regex even more.

-n is for no print by default and /pattern/p will print everything matching pattern

You might want to add another [^:]* before \(smith\|jones\) if you want to match the middle of the user name, eg:

sed -n '/\([^:]*:\)\{4\}[^:]*\(th\|es\)/p' /etc/passwd

Will match Smith and Jones.

As pointed out in the comments you can also use Extended Regular Expressions to avoid all those backslashes:

sed -E -n '/([^:]*:){4}(smith|jones)/p' /etc/passwd

Traditionally GNU sed used -r to enable ERE and BSD sed uses -E. GNU sed however support the -E flag even though it's undocumented.

Upvotes: 5

Jahid
Jahid

Reputation: 22438

This should work:

sed -n '/^\([^:]*:\)\{4\}[^:]*\(Jones\|Smith\)/p' /etc/passwd

^\([^:]*:\)\{4\} matches the first four fields delimited with :, and thus the fifth field is matched against the names (Jones and Smith).

Upvotes: 0

Related Questions