Nakkhatra
Nakkhatra

Reputation: 65

replace string in XML file python

I want to replace all the strings (Except the image filename, only change those in the name tags) 'bicycle' in the xml file with 'bike'. I wanted to do with re.sub by using .readlines(), but that's not working. Can anyone advise how can I do that in the most efficient way (A good explanation will be of much help)?

<annotation>
    <folder>images</folder>
    <filename>bicycle (10).jpg</filename>
    <path>C:\Users\Merida\Desktop\Bicycle\images\bicycle (10).jpg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>960</width>
        <height>636</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>bicycle</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>68</xmin>
            <ymin>24</ymin>
            <xmax>755</xmax>
            <ymax>632</ymax>
        </bndbox>
    </object>
    <object>
        <name>bicycle</name>
        <pose>Unspecified</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>1</xmin>
            <ymin>28</ymin>
            <xmax>189</xmax>
            <ymax>435</ymax>
        </bndbox>
    </object>
</annotation>

Upvotes: 2

Views: 2507

Answers (3)

Yitzhak Khabinsky
Yitzhak Khabinsky

Reputation: 22177

Please try the following XSLT based solution.

The XSLT is following a so called Identity Transform pattern.

It will modify <name> element values from 'bicycle' to 'bike', leaving everything else intact.

Input XML

<?xml version="1.0"?>
<annotation>
    <folder>images</folder>
    <filename>bicycle (10).jpg</filename>
    <path>C:\Users\Merida\Desktop\Bicycle\images\bicycle (10).jpg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>960</width>
        <height>636</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>bicycle</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>68</xmin>
            <ymin>24</ymin>
            <xmax>755</xmax>
            <ymax>632</ymax>
        </bndbox>
    </object>
    <object>
        <name>bicycle</name>
        <pose>Unspecified</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>1</xmin>
            <ymin>28</ymin>
            <xmax>189</xmax>
            <ymax>435</ymax>
        </bndbox>
    </object>
</annotation>

XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="name[.='bicycle']">
        <xsl:copy>bike</xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Output XML

<annotation>
  <folder>images</folder>
  <filename>bicycle (10).jpg</filename>
  <path>C:\Users\Merida\Desktop\Bicycle\images\bicycle (10).jpg</path>
  <source>
    <database>Unknown</database>
  </source>
  <size>
    <width>960</width>
    <height>636</height>
    <depth>3</depth>
  </size>
  <segmented>0</segmented>
  <object>
    <name>bike</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>68</xmin>
      <ymin>24</ymin>
      <xmax>755</xmax>
      <ymax>632</ymax>
    </bndbox>
  </object>
  <object>
    <name>bike</name>
    <pose>Unspecified</pose>
    <truncated>1</truncated>
    <difficult>0</difficult>
    <bndbox>
      <xmin>1</xmin>
      <ymin>28</ymin>
      <xmax>189</xmax>
      <ymax>435</ymax>
    </bndbox>
  </object>
</annotation>

Upvotes: 1

Tranbi
Tranbi

Reputation: 12701

If you want to replace ALL occurrences of "bicycle" it can be easily done with 'replace':

input_file = "example.xml"
output_file = "output.xml"
with open(input_file) as f:
    xml_content = f.readlines()
    
with open(output_file, 'w+') as f:
    for line in xml_content:
        f.write(line.replace('bicycle', 'bike'))

However, if you want to keep the structure of your xml intact (in case an element or attribute name would be bicycle) you might wanna take a look at elementTree or lxml.

Edit: after the edit of your question here a cleaner solution with elementTree:

import xml.etree.ElementTree as ET
input_file = "example.xml"
output_file = "output.xml"

tree = ET.parse(input_file)
root = tree.getroot()
name_elts = root.findall(".//name")    # we find all 'name' elements

for elt in name_elts:
    elt.text = elt.text.replace("bicycle", "bike")

tree.write(output_file)

Upvotes: 2

Luke Darcy
Luke Darcy

Reputation: 43

This was my approach, this will replace all instances of "bicycle" with "bike". This will also change "bicycle" in the path that you specified, which I think is what you were looking for. Also "text.xml" would need to be replaced with the name of the file you used

# Open file containing xml text and copy contents to string
f = open("test.xml", "r+")
xmlText = f.read()

# Bring pointer back to start of file and delete all contents
f.seek(0)
f.truncate()

# Replace all instances of bicycle with bike
newText = xmlText.replace("bicycle", "bike")

# Write this new text with replaced words to the file and close
f.write(newText)
f.close()

Upvotes: 2

Related Questions