user32117
user32117

Reputation:

Porting from Python to C#

I´m trying to learn C#, coming from a Python/PHP background, and I´m trying to port a script from Python to getting started.

The script reads a text file line by line (about 150K lines), apply a list of regex until one is matched, get the named groups results and add the values as properties of a class.

Here´s how the data looks like (each line starting by 'No.' is the beginning of a new record):

No.813177294  09/01/1987  150
Tit.INCAL INDÚSTRIA DE CALÇADOS LTDA (BR/PE)
*PARÁGRAFO ÚNICO DO ART. 162 DA LPI.
Procurador: ROBERTO C. FREIRE

No.901699870  02/06/2009  LD6
*Exigência Formal não respondida, Pedido de Registro de Marca considerado inexistente, de acordo com o Art. 157 da LPI

No.830009817  12/12/2008  003
Tit.BIOLAB SANUS FARMACÊUTICA LTDA. (BR/SP)
C.N.P.J./C.I.C./NºINPI : 49475833000106
Apres.: Nominativa ; Nat.: De Produto
Marca: ENXUG
NCL(9) 05 medicamentos para uso humano; preparações farmacêuticas; diuréticos, analgésicos;
anestésicos; anti-helmínticos; antibióticos; hormônios para uso medicinal.
Procurador: CRUZEIRO/NEWMARC PATENTES E MARCAS LTDA

And how the regex looks like:

regexp = {
    # No.123456789  13/12/2008  560
    # No.123456789   13/12/2008  560
    # No.123456789 13/12/2008 560
    # No.123456789  560
    'number': re.compile(r'No.(?P<Number>[\d]{9}) +((?P<Date>[\d]{2}/[\d]{2}/[\d]{4}) +)?(?P<Code>.*)'),

    # NCL(7) 25 no no no no no ; no no no no no no; *nonono no non o nono
    # NCL(9) 25 no no no no no ; no no no no no no; *nonono no non o nono
    'ncl': re.compile(r'NCL\([\d]{1}\) (?P<Ncl>[\d]{2})( (?P<Especification>.*))?'),

    'doc': re.compile(r'C.N.P.J./C.I.C./NºINPI : (?P<Document>.*)'),
    'description': re.compile(r'\*(?P<Description>.*)'),

    ...
}

Now my questions:

1) Can I use the same concept, applying each of a Dictionary<string, Regex> in each line until one is matched?

2) If I do, there´s a way to get a Dictionary<string, string> of the named groups results? (At this stage I can treat everything as a string).

3) If supposed I have a class like this...

class Record
{
    public string Number { get; set; }
    public string Date { get; set; }
    public string Code { get; set; }
    public string Ncl { get; set; }
    public string Especification { get; set; }
    public string Document { get; set; }
    public string Description { get; set; }
}

...there is a way to set the properties with the values of the named groups?

4) I´m totally missing the point here, trying to code in a static typed language still thinking in a dynamically typed one? If this is the case, what can I do?

Sorry for this somewhat lengthy question. I really tried to resume to make this shorter :-)

Thanks in advance.

Upvotes: 1

Views: 1470

Answers (6)

Gishu
Gishu

Reputation: 136603

  1. Yes you can.
  2. .Net has support for named groups. So for (?<first>group)(?'second'group), the returned Match object will support named retrieval like this. You can build youself a dictionary from this object or directly pass the Match object
    var match = Regex.Match("subject", "regex");
    var matchedText = match.Groups("first")

    See Named Groups in .Net and Regex support in .Net
  3. I think writing a Record Record.Parse(namedValueCollection) would be a way to do it
  4. You write code... You learn. I find the reverse direction a bit disorienting.. Moving from dynamic to static should be relatively easier... just that you might have to write relatively more code for some routine tasks like iteration or map or select etc.

Upvotes: 2

Crash893
Crash893

Reputation: 11702

dictionary<string,string> dic_test = new dictionary<string,string>();

dic_test.add(key,value);

Upvotes: 0

Joel Coehoorn
Joel Coehoorn

Reputation: 415600

What you're looking for sounds do-able. Of course you'll want to look at System.Text.RegularExpressions, specifically the Regex type there.

Additionally, I'm really fond of the iterator pattern for reading lines from a file:

public static IEnumerable<string> ReadLines(string path)
{
    using(var sr = new StreamReader(path))
    {
       string line;
       while ( (line = sr.ReadLine()) != null)
       {
           yield return line;
       }
    }
}

You start with that base code (which you can re-use almost everywhere) and call it in this method:

public static IEnumerable<Record> ReadRecords(string path)
{
    IEnumerable<Regex> expresssions = new List<Regex>
    {
        new Regex( @"No.(?P<Number>[\d]{9}) +((?P<Date>[\d]{2}/[\d]{2}/[\d]{4}) +)?(?P<Code>.*)" ),
        new Regex( @"NCL\([\d]{1}\) (?P<Ncl>[\d]{2})( (?P<Especification>"), 
        new Regex( @"C.N.P.J./C.I.C./NºINPI : (?P<Document>.*)")
    };

    foreach ( MatchCollection matches 
        in ReadLines(path)
          .Select(s => expressions.First(e => e.IsMatch(s)).Matches(s)))
          .Where(m => m.Count > 0) 
    )                       
    {
        yield return Record.FromExpressionMatches(matches);
    }
}

Finish it up by adding a static factory method to your Record class that accepts a MatchCollection parameter. The one thing it looks like you're missing here is that you expect to hit each of the expressions once before completing a single record. That will work a little differently. But hopefully this gives you enough to get you really going.

Upvotes: 1

Eran Betzalel
Eran Betzalel

Reputation: 4193

If you really want to learn C#, you should demand only references and not full answers, like this one (RegEx class), but I'm sure you can find much more information with a quick Google search too.

Upvotes: 1

Alex Martelli
Alex Martelli

Reputation: 881487

1., sure

2., see e.g. here

3., yep, same basic concept as 2

4., nah, C# is flexible enough to allow you to port your architecture over

Also consider studying this book as the best intro to .NET for Python programmers AND vice versa (I'm biased, having been a tech editor and being a friend of the author, but I think this is objectively defensible;-).

Upvotes: 3

Preet Sangha
Preet Sangha

Reputation: 65466

Sorry this is not a specific answer, but could you use IronPython to convert your scripts to run under the CLR and then step to C#?

Upvotes: 1

Related Questions