Misiu
Misiu

Reputation: 4919

Parse text file with regex

I'm trying to parse some js files (ExtJS) and find all dependencies that are used by class in that file.

Sample js file looks like so:

Ext.define('Pandora.controller.Station', {
    extend: 'Ext.app.Controller',

    refs: [{
        ref: 'stationsList',
        selector: 'stationslist'
    }],

    stores: ['Stations', 'RecentSongs'],
    ...

What I want to get is Ext.app.Controller.

With my code I'm able to get all lines that contains extend

public void ReadAndFilter(string path)
{
    using (var reader = new StreamReader(path))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            if (line.Contains("extend"))
            {
                listBox2.Items.Add(line);
            }
        }
    }
}

But this also returns comments and other unnecessary things. My idea was to use RegEx to find all strings.

My problem is that sometimes line has some spaces in front and after extend.
Here are some samples that can be found in js files:

extend          : 'Ext.AbstractPlugin',
extend: 'Ext.util.Observable',
@extends Sch.feature.AbstractTimeSpan
extend      : "Sch.feature.AbstractTimeSpan",
extend              : "Sch.plugin.Lines",
extend : "Sch.util.DragTracker",

Running RegEx on this should return:

Ext.AbstractPlugin
Ext.util.Observable
Sch.feature.AbstractTimeSpan
Sch.plugin.Lines
Sch.util.DragTracker

Here is my attempt: extend[ ]*:[ ]*['"][a-zA-Z.]*['"], I've tested it here, but I want only to get part between quotes or double quotes (can this be also validated? So that we can exclude those with first quote and second double quote).

RegEx aren't maybe fastest, but I have no idea how else I could do that.
Any advices are welcome.

Upvotes: 1

Views: 1967

Answers (3)

Jerry
Jerry

Reputation: 71598

You can simply use a capture group; you wrap the required part between parentheses:

extend[ ]*:[ ]*['"]([a-zA-Z.]*)['"]

And you access them through .Groups[1].Value


EDIT: As per request:

extend *: *('|")(?<inside>[a-zA-Z.]*)\1

With this one, you can access the captured group with .Groups["inside"].Value

Upvotes: 4

Arman H
Arman H

Reputation: 5628

extend\s*:\s?("|')(.*)\1

\1 is a reference to whatever is captured by the parentheses in ("|'), so it will force the quotes to match up correctly.

In this case, the matched part (that you want) winds up in Groups[2].Value

Also, simply a stylistic suggestion: don't use [ ]* for matching spaces, those grouping brackets look too confusing when empty. A simple \s* is easier to read and clear to understand.

Upvotes: 4

crthompson
crthompson

Reputation: 15875

You are only missing a capture group. Note the parens around [a-zA-Z.]*

extend([ ]*):[ ]*['"]([a-zA-Z.]*)['"]

To implement this try:

var result = from Match match in Regex.Matches(line, "extend([ ]*):[ ]*['"]([a-zA-Z.]*)['"]") 
         select match.ToString();

Upvotes: 2

Related Questions