Reputation: 249
So I have a bunch of data that all looks like this:
janitor#1/2 of dorm#1/1
president#4/1 of class#2/2
hunting#1/1 hat#1/2
side#1/2 of hotel#1/1
side#1/2 of hotel#1/1
king#1/2 of hotel#1/1
address#2/2 of girl#1/1
one#2/1 in family#2/2
dance#3/1 floor#1/2
movie#1/2 stars#5/1
movie#1/2 stars#5/1
insurance#1/1 office#1/2
side#1/1 of floor#1/2
middle#4/1 of December#1/2
movie#1/2 stars#5/1
one#2/1 of tables#2/2
people#1/2 at table#2/1
Some lines have prepositions, others don't so I thought I could use regular expressions to clean it up. What I need is each noun, the # sign and the following number on its own line. So for example, the first lines of output should look like this in the final file:
janitor#1
dorm#1
president#4
etc...
The list is stored in a file called NPs. My code to do this is:
cat NPs | grep -E '\b(\w*[#][1-9]).' >> test
When I open test, however, it's the exact same as the input file. Any input as to what I'm missing? It doesn't seem like it should be a hard operation, so maybe I'm missing something about syntax? I'm using this command from a shell script that is called in bash.
Thanks in advance!
Upvotes: 0
Views: 289
Reputation: 41446
An awk
version:
awk '/#/ {print $NF}' RS="/" NPs
janitor#1
dorm#1
president#4
class#2
hunting#1
hat#1
side#1
hotel#1
side#1
hotel#1
king#1
hotel#1
address#2
girl#1
one#2
family#2
dance#3
floor#1
movie#1
stars#5
movie#1
stars#5
insurance#1
office#1
side#1
floor#1
middle#4
December#1
movie#1
stars#5
one#2
tables#2
people#1
table#2
Upvotes: 0
Reputation: 70722
This should do what you need.
The -o
option will show only the part of a matching line that matches the PATTERN.
grep -Eo '[a-z#]+[1-9]' NPs > test
or even the -P
option, which Interprets the PATTERN as a Perl regular expression
grep -Po '[\w#]*(?=/)' NPs > test
Upvotes: 1
Reputation: 123448
Using grep
:
$ grep -o "\w*[#]\w*" inputfile
janitor#1
dorm#1
president#4
class#2
hunting#1
hat#1
side#1
hotel#1
side#1
hotel#1
king#1
hotel#1
address#2
girl#1
one#2
family#2
dance#3
floor#1
movie#1
stars#5
movie#1
stars#5
insurance#1
office#1
side#1
floor#1
middle#4
ecember#1
movie#1
stars#5
one#2
tables#2
people#1
table#2
Upvotes: 0
Reputation: 12316
Grep by default just searches for the text, so in your case it is printing the lines that match. I think you want to investigate sed
instead to perform the replacement. (And you don't need to cat
the file, just grep PATTERN filename
)
To get your output on separate lines, this worked for me:
sed 's|/.||g' NPs | sed 's/ .. /=/' | tr "=" "\n"
This uses two seds in a row to do different substitutions, and tr
to insert line feeds.
The -o
option in grep, which causes it to print out only the matching text, as described in another answer, is probably even simpler!
Upvotes: 0
Reputation: 8854
You need sed
, not grep
. (Or awk
, or perl
.) It looks like this would do what you want:
cat NPs | sed 's?/.*??'
or simply
sed 's?/.*??' NPs
s
means "substitute". The next character is the delimiter between regular expressions. Usually it's "/", but since you need to search for "/", I used "?" instead. "." refers to any character, and "*" says "zero or more of what preceded me". Whatever is between the last two delimiters is the replacement string. In this case it's empty, so you're replacing "/" followed by zero or more of any character, with the empty string.
EDIT: Oh, I see now that you wanted to extract the last item on the line, too. Well, I'm sure that others' suggested regexps would work. If it were my problem, I'd probably filter the file in two steps, perhaps piping the results from one step to the next, or using multiple substitutions with sed
: First delete the "of"s and middle spaces, and add newlines, and then run sed
as above. It's not as cool as doing it all in one regexp, but each step is easier to understand. For even more simplicity and uncoolness, use three steps, replacing " of " with space in the first step. Since others have provided complete solutions, I won't work out the details.
Upvotes: 0
Reputation: 18389
grep variations extracting entire lines from text, if they match pattern. If you need to modify lines, you should use sed
, like
cat NPs | sed 's/^\(\b\w*[#][1-9]\).*$/\1/g'
Upvotes: 0