babysnow
babysnow

Reputation: 407

python regular expression to match strings

I want to parse a string, such as:

package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'
uses-permission:'android.permission.WRITE_APN_SETTINGS'
uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'
uses-permission:'android.permission.ACCESS_NETWORK_STATE'

I want to get:

string1: jp.tjkapp.droidllwp`

string2: 1.1

Because there are multiple uses-permission, I want to get permission as a list, contains: WRITE_APN_SETTINGS, RECEIVE_BOOT_COMPLETED and ACCESS_NETWORK_STATE.

Could you help me write the python regular expression to get the strings I want? Thanks.

Upvotes: 1

Views: 567

Answers (3)

Rushik
Rushik

Reputation: 1131

Here is one example code

#!/usr/bin/env python
inputFile = open("test.txt", "r").readlines()
for line in inputFile:
    if line.startswith("package"):
        words = line.split()
        string1 = words[1].split("=")[1].replace("'","")
        string2 = words[3].split("=")[1].replace("'","")

test.txt file contains input data you mentioned earlier..

Upvotes: 0

jdotjdot
jdotjdot

Reputation: 17042

Assuming the code block you provided is one long string, here stored in a variable called input_string:

name = re.search(r"(?<=name\=\')[\w\.]+?(?=\')", input_string).group(0)
versionName = re.search(r"(?<=versionName\=\')\d+?\.\d+?(?=\')", input_string).group(0)
permissions = re.findall(r'(?<=android\.permission\.)[A-Z_]+(?=\')', input_string)

Explanation:

name

  • (?<=name\=\'): check ahead of the main string in order to return only strings that are preceded by name='. The \ in front of = and ' serve to escape them so that the regex knows we're talking about the = string and not a regex command. name=' is not also returned when we get the result, we just know that the results we get are all preceded by it.
  • [\w\.]+?: This is the main string we're searching for. \w means any alphanumeric character and underscore. \. is an escaped period, so the regex knows we mean . and not the regex command represented by an unescaped period. Putting these in [] means we're okay with anything we've stuck in brackets, so we're saying that we'll accept any alphanumeric character, _, or .. + afterwords means at least one of the previous thing, meaning at least one (but possibly more) of [\w\.]. Finally, the ? means don't be greedy--we're telling the regex to get the smallest possible group that meets these specifications, since + could go on for an unlimited number of repeats of anything matched by [\w\.].
  • (?=\'): check behind the main string in order to return only strings that are followed by '. The \ is also an escape, since otherwise regex or Python's string execution might misinterpret '. This final ' is not returned with our results, we just know that in the original string, it followed any result we do end up getting.

Upvotes: 1

RanRag
RanRag

Reputation: 49537

You can do this without regex by reading the file content line by line.

>>> def split_string(s):
...     if s.startswith('package'):
...             return [i.split('=')[1] for i in s.split() if "=" in i]
...     elif s.startswith('uses-permission'):
...             return s.split('.')[-1]
... 
>>> split_string("package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'")
["'jp.tjkapp.droid1lwp'", "'2'", "'1.1'"]
>>> split_string("uses-permission:'android.permission.WRITE_APN_SETTINGS'")
"WRITE_APN_SETTINGS'"
>>> split_string("uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'")
"RECEIVE_BOOT_COMPLETED'"
>>> split_string("uses-permission:'android.permission.ACCESS_NETWORK_STATE'")
"ACCESS_NETWORK_STATE'"
>>> 

Upvotes: 0

Related Questions