Irawan Soetomo
Irawan Soetomo

Reputation: 1325

Searching a list of keywords from text files in folders

I have compiled a list of db object names, one name per line, in a text file. I want to know for each names, where it is being used. The target search is a group of folders containing sub-folders of source codes.

Before I give up looking for a tool to do this and start creating my own, perhaps you can help to point to me an existing one.

Ideally, it should be a Windows desktop application. I have not used grep before.

Upvotes: 1

Views: 1747

Answers (3)

Irawan Soetomo
Irawan Soetomo

Reputation: 1325

I had created an SSIS package to load my 500+ source code files that is distributed into some depth of folders belongs to several projects, into a table, with 1 row as 1 line from the files (total is 10K+ lines).

I then made a select statement against it, by cross-applying the table that keeps the list of 5K+ keywords of db objects, with the help of RegEx for MS-SQL, http://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/. The query took almost 1.5 hr to complete.

I know it's a long winded, but this is exactly what I need. I thank you for your efforts in guiding me. I would be happy to explain the details further, should anyone gets interested using my method.

enter image description here

insert
    dbo.DbObjectUsage
select
    do.Id as DbObjectId,
    fl.Id as FileLineId
from 
    dbo.FileLine as fl -- 10K+
cross apply
    dbo.DbObject as do -- 5K+
where 
    dbo.RegExIsMatch('\b' + do.name + '\b', fl.Line, 0) != 0

Upvotes: 0

Ira Baxter
Ira Baxter

Reputation: 95392

See our Source Code Search Engine. It indexes a large code base according to the atoms (tokens) of the language(s) of interest, and then uses that index to quickly execute structured queries stated in terms of language elememnts. It is a kind of super-grep, but it isn't fooled by comments or string literals, and it automatically ignores whitespace. This means you get a lot fewer false positive hits than you get with grep.

If you had an identifier "foo", the following query would find all mentions:

 I=foo

For C and Java, you can constrain the types of identifier accesses to Use, Read, Write or Defines.

  D=bar*

would find only declarations of identifiers which started with the letters "bar".

You can write more complex queries using sequences of language tokens:

'int' I=*baz* '['

for C, would find declarations of any variable name that contained the letters "baz" and apparantly declared an array.

You can see the hits in a GUI, and one-click navigate to a source code view of any hit.

It is a Windows application. It handles a wide variety of languages: C#, C++, Java, ... and many more.

Upvotes: 0

Adrien Plisson
Adrien Plisson

Reputation: 23303

use grep (there are tons of port of this command to windows, search the web).

eventually, use AgentRansack.

Upvotes: 1

Related Questions