Reputation: 1997
I have command in Grep:
cat nastava.html | grep '<td>[A-Z a-z]*</td><td>[0-9/]*</td>' | sed 's/[ \t]*<td>\([A-Z a-z]*\)<\/td><td>\([0-9]\{1,3\}\)\/[0-9]\{2\}\([0-9]\{2\}\)<\/td>.*/\1 mi\3\2 /'
|sort|grep -n ".*" | sed -r 's/(.*):(.*)/\1. \2/' >studenti.txt
I don't understand second line, sort is ok, grep -n means to num that sorted list, but why do we use here ".*"? It won't work without it, and i don't understand why.
Upvotes: 0
Views: 155
Reputation: 189830
The grep
is used purely for the side effect of the line numbering with the -n
option here, so the main thing is really to use a regular expression which matches all the input lines. As such, .*
is not very elegant -- ^
would work without scanning every line, and $
trivially matches every line as well. Since you know the input lines are not empty, thus contain at least one character, the simple regular expression .
would work perfectly, too.
However, as the end goal is to perform line numbering, a better solution is to use a dedicated tool for this purpose.
... | sort | nl -ba -s '. '
The -ba
option specifies to number all lines (the default is to only add a line number to non-empty lines; we know there are no empty lines, so it's not strictly necessary here, but it's good to know) and the -s
option specifies the separator string to put after the number.
A possible minor complication is that the line number format is whitespace-padded, so in the end, this solution may not work for you if you specifically want unpadded numbers. (But a sed
postprocessor to fix that up is a lot simpler than the postprocessor for grep
you have now -- just sed 's/^ *//'
will remove leading whitespace).
... As an aside, the ugly cat | grep | sed
pipeline can be abbreviated to just
sed -n 's%[ \t]*<td>\([A-Z a-z]*\)</td><td>\([0-9]\{1,3\}\)/[0-9]\{2\}\([0-9]\{2\}\)</td>.*%\1 mi\3\2 %p' nastava.html
The cat
was never necessary in the first place, and the sed
script can easily be refactored to only print when a substitution was performed (your grep
regular expression was not exactly equivalent to the one you have in the sed
script but I assume that was the intent). Also, using a different separator avoids having to backslash the slashes.
... And of course, if nastava.html
is your own web page, the whole process is umop apisdn. You should have the students results in a machine-readable form, and generate a web page from that, rather than the other way around.
Upvotes: 4
Reputation: 242218
grep
needs a regular expression to match. You can't run grep
with no expression at all. If you want to number all the lines, just specify an expression that matches anything. I'd probably use ^
instead of .*
.
Upvotes: 3