KenLit
KenLit

Reputation: 33

Setting a variable to a matched regex in Python

I have lines of features describing the behavior of English prepositions, for 80,000 lines to process, where I'm trying to characterize, e.g., the parts of speech for the preposition 'across'.

    samp = "across.p.cpa.312(2)c:l:whichc:pos:wdtc:ri:rulefired"
    print(re.search(sep + 'hr:pos:([a-z]+)' + sep, line))
    <re.Match object; span=(6840, 6852), match='\x18hr:pos:nns\x18'>

Note that '\x18' is a separator from the line. There are 1333 such features in a line of length 15942. But, how do I get the match out to a variable that I can then do more analysis. This is easy to do in Perl, but Python seems to make it very difficult.

Upvotes: 0

Views: 54

Answers (2)

RootTwo
RootTwo

Reputation: 4418

search() returns a MatchObject. Use the group() method to get the portion of the string that matched. group(0) returns the entire match, group(1) returns the first group in the regex. You can also use indexing.

m = re.search(sep + 'hr:pos:([a-z]+)' + sep, line)

These return the whole match:

m.group(0)
m[0]

These return the 1st group in the match ('nns' in the example):

m.group[1]
m[1]

Upvotes: 1

KenLit
KenLit

Reputation: 33

Okay, I started again. Set m as below, then set pos to the first group.

  m = re.search(sep + 'hr:pos:([a-z]+)' + sep, line)
  pos = m.group(0)
  pos = '\x18hr:pos:nns\x18'

Boy, they don't make it easy to find out how to do this stuff.

Upvotes: 0

Related Questions