Reputation: 407
I want to parse a string, such as:
package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'
uses-permission:'android.permission.WRITE_APN_SETTINGS'
uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'
uses-permission:'android.permission.ACCESS_NETWORK_STATE'
I want to get:
string1: jp.tjkapp.droidllwp`
string2: 1.1
Because there are multiple uses-permission, I want to get permission as a list, contains:
WRITE_APN_SETTINGS
, RECEIVE_BOOT_COMPLETED
and ACCESS_NETWORK_STATE
.
Could you help me write the python regular expression to get the strings I want? Thanks.
Upvotes: 1
Views: 567
Reputation: 1131
Here is one example code
#!/usr/bin/env python
inputFile = open("test.txt", "r").readlines()
for line in inputFile:
if line.startswith("package"):
words = line.split()
string1 = words[1].split("=")[1].replace("'","")
string2 = words[3].split("=")[1].replace("'","")
test.txt file contains input data you mentioned earlier..
Upvotes: 0
Reputation: 17042
Assuming the code block you provided is one long string, here stored in a variable called input_string
:
name = re.search(r"(?<=name\=\')[\w\.]+?(?=\')", input_string).group(0)
versionName = re.search(r"(?<=versionName\=\')\d+?\.\d+?(?=\')", input_string).group(0)
permissions = re.findall(r'(?<=android\.permission\.)[A-Z_]+(?=\')', input_string)
Explanation:
(?<=name\=\')
: check ahead of the main string in order to return only strings that are preceded by name='
. The \
in front of =
and '
serve to escape them so that the regex knows we're talking about the =
string and not a regex command. name='
is not also returned when we get the result, we just know that the results we get are all preceded by it.[\w\.]+?
: This is the main string we're searching for. \w
means any alphanumeric character and underscore. \.
is an escaped period, so the regex knows we mean .
and not the regex command represented by an unescaped period. Putting these in []
means we're okay with anything we've stuck in brackets, so we're saying that we'll accept any alphanumeric character, _
, or .
. +
afterwords means at least one of the previous thing, meaning at least one (but possibly more) of [\w\.]
. Finally, the ?
means don't be greedy--we're telling the regex to get the smallest possible group that meets these specifications, since +
could go on for an unlimited number of repeats of anything matched by [\w\.]
.(?=\')
: check behind the main string in order to return only strings that are followed by '
. The \
is also an escape, since otherwise regex or Python's string execution might misinterpret '
. This final '
is not returned with our results, we just know that in the original string, it followed any result we do end up getting.Upvotes: 1
Reputation: 49537
You can do this without regex by reading the file content line by line.
>>> def split_string(s):
... if s.startswith('package'):
... return [i.split('=')[1] for i in s.split() if "=" in i]
... elif s.startswith('uses-permission'):
... return s.split('.')[-1]
...
>>> split_string("package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'")
["'jp.tjkapp.droid1lwp'", "'2'", "'1.1'"]
>>> split_string("uses-permission:'android.permission.WRITE_APN_SETTINGS'")
"WRITE_APN_SETTINGS'"
>>> split_string("uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'")
"RECEIVE_BOOT_COMPLETED'"
>>> split_string("uses-permission:'android.permission.ACCESS_NETWORK_STATE'")
"ACCESS_NETWORK_STATE'"
>>>
Upvotes: 0