Can't Tell
Can't Tell

Reputation: 13426

What should I know about Python to identify comments in different source files?

I have a need to identify comments in different kinds of source files in a given directory. ( For example java,XML, JavaScript, bash). I have decided to do this using Python (as an attempt to learn Python). The questions I have are

1) What should I know about python to get this done? ( I have an idea that Regular Expressions will be useful but are there alternatives/other modules that will be useful? Libraries that I can use to get this done?)

2) Is Python a good choice for such a task? Will some other language make this easier to accomplish?

Upvotes: 1

Views: 117

Answers (3)

alexis
alexis

Reputation: 50190

The pyparsing module directly supports several styles of comments. E.g.,

from pyparsing import javaStyleComment
for match in javaStyleComment.scanString(text):
    <do stuff>

So if your goal is just getting the job done, look into this since the comment parsers are likely to be more robust than anything you'd throw together. If you're more interested in learning to do it yourself, this might be too much processed food for your taste.

Upvotes: 2

marue
marue

Reputation: 5726

1) What you need to know about is parsing, not regex. Additionally you will need the os module and some knowledge about pythons file handling. DiveIntoPython (http://www.diveintopython.net/) is a good start here. I'd recommend chapter 6. (And maybe 1-5 as well :) )

2) Python is a good start. Another language is not going to make it easier, but different. Python allready is pretty simple to start with.

I would recommend not to use regex for your task, as it is as simple as searching for comment signs and linefeeds.

Upvotes: 2

C2H5OH
C2H5OH

Reputation: 5602

Your problem seems to be more related to programming language parsing. I believe with regular expressions you will be able to find comments in most of the languages. The good thing is that you have regular expressions almost everywhere: Perl, Python, Ruby, AWK, Sed, etc.

But, as the other answer said, you'd better use some parsing machinery. And, if not a full blown parser, a lexer. For Python, check out the Pygments library, which has lexers for many languages already implemented.

Upvotes: 5

Related Questions