Reputation: 331
How would you delete all comments using sed from a file(defined with #) with respect to '#' being in a string?
This helped out a lot except for the string portion.
Upvotes: 13
Views: 23562
Reputation: 536
As you have pointed out, sed won't work well if any parts of a script look like comments but actually aren't. For example, you could find a # inside a string, or the rather common $#
and ${#param}
.
I wrote a shell formatter called shfmt, which has a feature to minify code. That includes removing comments, among other things:
$ cat foo.sh
echo $# # inline comment
# lone comment
echo '# this is not a comment'
[mvdan@carbon:12] [0] [/home/mvdan]
$ shfmt -mn foo.sh
echo $#
echo '# this is not a comment'
The parser and printer are Go packages, so if you'd like a custom solution, it should be fairly easy to write a 20-line Go program to remove comments in the exact way that you want.
Upvotes: 0
Reputation: 29597
To remove comment lines (lines whose first non-whitespace character is #
) but not shebang lines (lines whose first characters are #!
):
sed '/^[[:space:]]*#[^!]/d; /#$/d' file
The first argument to sed
is a string containing a sed program consisting of two delete-line commands of the form /
regex/d
. Commands are separated by ;
. The first command deletes comment lines but not shebang lines. The second command deletes any remaining empty comment lines. It does not handle trailing comments.
The last argument to sed
is a file to use as input. In Bash, you can also operate on a string variable like this:
sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${MYSTRING}"
Example:
# test.sh
S0=$(cat << HERE
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
HERE
)
printf "\nBEFORE removal:\n\n${S0}\n\n"
S1=$(sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${S0}")
printf "\nAFTER removal:\n\n${S1}\n\n"
Output:
$ bash test.sh
BEFORE removal:
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
AFTER removal:
#!/usr/bin/env bash
echo 'FOO' # trailing comment
Upvotes: 3
Reputation: 1190
sed 's:^#\(.*\)$:\1:g' filename
Supposing the lines starts with single # comment, Above command removes all comments from file.
Upvotes: -1
Reputation: 189317
Supposing "being in a string" means "occurs between a pair of quotes, either single or double", the question can be rephrased as "remove everything after the first unquoted #". You can define the quoted strings, in turn, as anything between two quotes, excepting backslashed quotes. As a minor refinement, replace the entire line with everything up through just before the first unquoted #.
So we get something like [^\"'#]
for the trivial case -- a piece of string which is neither a comment sign, nor a backslash, nor an opening quote. Then we can accept a backslash followed by anything: \\.
-- that's not a literal dot, that's a literal backslash, followed by a dot metacharacter which matches any character.
Then we can allow zero or more repetitions of a quoted string. In order to accept either single or double quotes, allow zero or more of each. A quoted string shall be defined as an opening quote, followed by zero or more of either a backslashed arbitrary character, or any character except the closing quote: "\(\\.\|[^\"]\)*"
or similarly for single-quoted strings '\(\\.\|[^\']\)*'
.
Piecing all of this together, your sed
script could look something like this:
s/^\([^\"'#]*\|\\.\|"\(\\.\|[^\"]\)*"\|'\(\\.\|[^\']\)*'\)*\)#.*/\1/
But because it needs to be quoted, and both single and double quotes are included in the string, we need one more additional complication. Recall that the shell allows you to glue together strings like "foo"'bar'
gets replaced with foobar
-- foo
in double quotes, and bar
in single quotes. Thus you can include single quotes by putting them in double quotes adjacent to your single-quoted string -- '"foo"'"'"
is "foo"
in single quotes next to '
in double quotes, thus "foo"'
; and "'
can be expressed as '"'
adjacent to "'"
. And so a single-quoted string containing both double quotes foo"'bar
can be quoted with 'foo"'
adjacent to "'bar"
or, perhaps more realistically for this case 'foo"'
adjacent to "'"
adjacent to another single-quoted string 'bar'
, yielding 'foo'"'"'bar'
.
sed 's/^\(\(\\.\|[^\#"'"'"']*\|"\(\\.\|[^\"]\)*"\|'"'"'\(\\.\|[^\'"'"']\)*'"'"'\)*\)#.*/\1/p' file
This was tested on Linux; on other platforms, the sed
dialect may be slightly different. For example, you may need to omit the backslashes before the grouping and alteration operators.
Alas, if you may have multi-line quoted strings, this will not work; sed
, by design, only examines one input line at a time. You could build a complex script which collects multiple lines into memory, but by then, switching to e.g. Perl starts to make a lot of sense.
Upvotes: 1
Reputation: 58371
This might work for you (GNU sed):
sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta;s/\n\([^#]\)/\1\n/;ta;s/\n.*//' file
/#/!b
if the line does not contain a #
bail outs/^/\n/
insert a unique marker (\n
)ta;:a
jump to a loop label (resets the substitute true/false flag)s/\n$//;t
if marker at the end of the line, remove and bail outs/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta
if the string following the marker is a quoted one, bump the marker forward of it and loop.s/\n\([^#]\)/\1\n/;ta
if the character following the marker is not a #
, bump the marker forward of it and loop.s/\n.*//
the remainder of the line is comment, remove the marker and the rest of line.Upvotes: 7
Reputation: 20450
Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.
Case 1: entire line is the comment
The following should be sufficient enough in most case:
sed '/^\s*#/d' file
It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace
), followed by a #
, then delete the line by d
command.
Any lines like:
# comment started from beginning.
# any number of white-space character before
# or 'quote' in "here"
They will be deleted.
But
a="foobar in #comment"
will not be deleted, which is the desired result.
Case 2: comment after actual code
For example:
if [[ $foo == "#bar" ]]; then # comment here
The comment part can be removed by
sed "s/\s*#*[^\"']*$//" file
[^\"']
is used to prevent quoted string confusion, however, it also means that comments with quotations '
or "
will not to be removed.
Final sed
sed "/^\s*#/d;s/\s*#[^\"']*$//" file
Upvotes: 5
Reputation: 20205
If #
always means comment, and can appear anywhere on a line (like after some code):
sed 's:#.*$::g' <file-name>
If you want to change it in place, add the -i
switch:
sed -i 's:#.*$::g' <file-name>
This will delete from any #
to the end of the line, ignoring any context. If you use #
anywhere where it's not a comment (like in a string), it will delete that too.
If comments can only start at the beginning of a line, do something like this:
sed 's:^#.*$::g' <file-name>
If they may be preceded by whitespace, but nothing else, do:
sed 's:^\s*#.*$::g' <file-name>
These two will be a little safer because they likely won't delete valid usage of #
in your code, such as in strings.
Edit:
There's not really a nice way of detecting whether something is in a string. I'd use the last two if that would satisfy the constraints of your language.
The problem with detecting whether you're in a string is that regular expressions can't do everything. There are a few problems:
A regular expression can't match nested quotes (these cases will confuse the regex):
# "hello there"
# hello there"
"# hello there"
If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:
sed 's:#[^"]*$::g' <file-name>
That's a lot of pre-conditions, but if they all hold, you're in business. Otherwise, I'm afraid you're SOL, and you'd be better off writing it in something like Python, where you can do more advanced logic.
Upvotes: 15