Liqun
Liqun

Reputation: 4171

python regular expression substitution with matched group

I'm trying to substitue the channel name for AndroidManifest.xml to batch generate a groups of channel apk packages for release.

<meta-data android:value="CHANNEL_NAME_TO_BE_DETERMINED" android:name="UMENG_CHANNEL"/> from an xml file.

The channel configs are saved in a config file, sth like:

channel_name    output_postfix  valid 
"androidmarket" "androidmarket" true

Here is what I tried:

manifest_original_xml_fh = open("../AndroidManifest_original.xml", "r")
manifest_xml_fh = open("../AndroidManifest.xml", "w")
pattern = re.compile('<meta-data\sandroid:value=\"(.*)\"\sandroid:name=\"UMENG_CHANNEL\".*')
for each_config_line in manifest_original_xml_fh:
    each_config_line = re.sub(pattern, channel_name, each_config_line) 
    print each_config_line

It replaces the whole <meta-data android:value="CHANNEL_NAME_TO_BE_DETERMINED" android:name="UMENG_CHANNEL"/> to androidmarket which is obviously not my need. Then I figured out the problem is that pattern.match(each_config_line) return a match result ,and one of the result group is "CHANNEL_NAME_TO_BE_DETERMINED". I've also tried to give some replace implementation function, but still failed.

So, since I've successfully find the pattern, how can I replace the matched result group element correctly?

Upvotes: 0

Views: 819

Answers (3)

stema
stema

Reputation: 92976

I think your misunderstanding is, everything that has been matched will be replaced. If you want to keep stuff from the pattern, you have to capture it and reinsert it in the replacement string.

Or match only what you want to replace by using lookaround assertions

Try this

pattern = re.compile('(?<=<meta-data\sandroid:value=\")[^"]+')
for each_config_line in manifest_original_xml_fh:
    each_config_line = re.sub(pattern, channel_name, each_config_line)

(?<=<meta-data\sandroid:value=\") is a positive lookbehind assertion, it ensures that this text is before, but does not match it (so it will not be replaced)

[^"]+ will then match anything that is not a "

See it here on Regexr

Upvotes: 0

Don
Don

Reputation: 17606

I suggest a different approach: save your xml as a template, with placeholders to be replaced with standard Python string operations.

E.g.

AndroidManifest_template.xml:

<meta-data android:value="%(channel_name)s" android:name="UMENG_CHANNEL"/>

python:

manifest_original_xml_fh = open("../AndroidManifest_template.xml", "r")
manifest_xml_fh = open("../AndroidManifest.xml", "w")
for each_config_line in manifest_original_xml_fh:
    each_config_line = each_config_line % {'channel_name': channel_name}
    print each_config_line

Upvotes: 1

Joanna Derks
Joanna Derks

Reputation: 4063

To capture just the value of the meta-data tag you need to change the regex:

<meta-data\sandroid:value=\"([^"]*)\"\sandroid:name=\"UMENG_CHANNEL\".*

Specifically I changed this part:

\"(.*)\" - this is a greedy match, so it will go ahead and match as many characters as possible as long as the rest of the expression matches

to

\"([^"]*)\" - which will match anything that's not the double quote. The matching result will still be in the first capturing group

If you want to do the replace thing, a better idea might be to capture what you want to stay the same - I'm not a python expert but something like this would probably work:

re.sub(r'(<meta-data\sandroid:value=\")[^"]*(\"\sandroid:name=\"UMENG_CHANNEL\".*)'
, r'\1YourNewValue\2', s)

\1 is backreference 1 - i.e. it gets what the first capturing group matched

Upvotes: 0

Related Questions