Reputation: 249

Regular expressions with grep

So I have a bunch of data that all looks like this:

janitor#1/2 of dorm#1/1
president#4/1 of class#2/2
hunting#1/1 hat#1/2
side#1/2 of hotel#1/1
side#1/2 of hotel#1/1
king#1/2 of hotel#1/1
address#2/2 of girl#1/1
one#2/1 in family#2/2
dance#3/1 floor#1/2
movie#1/2 stars#5/1
movie#1/2 stars#5/1
insurance#1/1 office#1/2
side#1/1 of floor#1/2
middle#4/1 of December#1/2
movie#1/2 stars#5/1
one#2/1 of tables#2/2
people#1/2 at table#2/1

Some lines have prepositions, others don't so I thought I could use regular expressions to clean it up. What I need is each noun, the # sign and the following number on its own line. So for example, the first lines of output should look like this in the final file:

janitor#1
dorm#1
president#4
etc...

The list is stored in a file called NPs. My code to do this is:

cat NPs | grep -E '\b(\w*[#][1-9]).' >> test

When I open test, however, it's the exact same as the input file. Any input as to what I'm missing? It doesn't seem like it should be a hard operation, so maybe I'm missing something about syntax? I'm using this command from a shell script that is called in bash.

Thanks in advance!

Upvotes: 0

Answers (6)

Jotne

Reputation: 41446

An awk version:

awk '/#/ {print $NF}' RS="/" NPs
janitor#1
dorm#1
president#4
class#2
hunting#1
hat#1
side#1
hotel#1
side#1
hotel#1
king#1
hotel#1
address#2
girl#1
one#2
family#2
dance#3
floor#1
movie#1
stars#5
movie#1
stars#5
insurance#1
office#1
side#1
floor#1
middle#4
December#1
movie#1
stars#5
one#2
tables#2
people#1
table#2

Upvotes: 0

hwnd

Reputation: 70722

This should do what you need.

The -o option will show only the part of a matching line that matches the PATTERN.

grep -Eo '[a-z#]+[1-9]' NPs > test

or even the -P option, which Interprets the PATTERN as a Perl regular expression

grep -Po '[\w#]*(?=/)' NPs > test

Upvotes: 1

devnull

Reputation: 123448

Using grep:

$ grep -o "\w*[#]\w*" inputfile
janitor#1
dorm#1
president#4
class#2
hunting#1
hat#1
side#1
hotel#1
side#1
hotel#1
king#1
hotel#1
address#2
girl#1
one#2
family#2
dance#3
floor#1
movie#1
stars#5
movie#1
stars#5
insurance#1
office#1
side#1
floor#1
middle#4
ecember#1
movie#1
stars#5
one#2
tables#2
people#1
table#2

Upvotes: 0

beroe

Reputation: 12316

Grep by default just searches for the text, so in your case it is printing the lines that match. I think you want to investigate sed instead to perform the replacement. (And you don't need to cat the file, just grep PATTERN filename)

To get your output on separate lines, this worked for me:

sed 's|/.||g' NPs | sed 's/ .. /=/' | tr "=" "\n"

This uses two seds in a row to do different substitutions, and tr to insert line feeds.

The -o option in grep, which causes it to print out only the matching text, as described in another answer, is probably even simpler!

Upvotes: 0

Mars

Reputation: 8854

You need sed, not grep. (Or awk, or perl.) It looks like this would do what you want:

cat NPs | sed 's?/.*??'

or simply

sed 's?/.*??' NPs

s means "substitute". The next character is the delimiter between regular expressions. Usually it's "/", but since you need to search for "/", I used "?" instead. "." refers to any character, and "*" says "zero or more of what preceded me". Whatever is between the last two delimiters is the replacement string. In this case it's empty, so you're replacing "/" followed by zero or more of any character, with the empty string.

EDIT: Oh, I see now that you wanted to extract the last item on the line, too. Well, I'm sure that others' suggested regexps would work. If it were my problem, I'd probably filter the file in two steps, perhaps piping the results from one step to the next, or using multiple substitutions with sed: First delete the "of"s and middle spaces, and add newlines, and then run sed as above. It's not as cool as doing it all in one regexp, but each step is easier to understand. For even more simplicity and uncoolness, use three steps, replacing " of " with space in the first step. Since others have provided complete solutions, I won't work out the details.

Upvotes: 0

keltar

Reputation: 18389

grep variations extracting entire lines from text, if they match pattern. If you need to modify lines, you should use sed, like

cat NPs | sed 's/^\(\b\w*[#][1-9]\).*$/\1/g'

Upvotes: 0

Regular expressions with grep

Answers (6)

Related Questions