Robert
Robert

Reputation: 135

Python parsing complex text

I'm struggling to develop an algorithm that can edit the below snip of an XML file. Can anyone help with ideas? Requirements are to parse the file as input, remove the "cipher" that uses "RC4", and output a new xml file, with just "RC4" cipher removed. The problem is there are multiple "Connector" sections within the XML file. I need to read all of them, but only edit the one that uses port 443 and with a specific IP address. So the script would need to parse each Connector section one at a time, but discard the ones that don't have correct IP address and port. Have tried: 1. Using ElementTree XML parser. Problem is it doesn't output the new XLM file well - it's a mess. I need it prettified with python 2.6.

<Connector
protocol="org.apache.coyote.http11.Http11NioProtocol"
port="443"
redirectPort="443"
executor="tomcatThreadPool"
disableUploadTimeout="true"
SSLEnabled="true"
scheme="https"
secure="true"
clientAuth="false"
sslEnabledProtocols="TLSv1,TLSv1.1,TLSv1.2"
keystoreType="JKS"
keystoreFile="tomcat.keystore"
keystorePass="XXXXX"
server="XXXX"
ciphers="TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
         TLS_DH_RSA_WITH_AES_128_CBC_SHA,
         TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
         TLS_DH_DSS_WITH_AES_128_CBC_SHA,
         TLS_RSA_WITH_AES_128_CBC_SHA,
         TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
         TLS_DH_RSA_WITH_3DES_EDE_CBC_SHA,
         TLS_RSA_WITH_3DES_EDE_CBC_SHA,
         TLS_RSA_WITH_RC4_128_SHA"
address="192.168.10.6">

Here was my code:

from xml.etree import ElementTree
print "[+] Checking for removal of RC4 ciphers"
file = "template.xml"

with open(file, 'rt') as f:
    tree = ElementTree.parse(f)
f.close()

for node in tree.getiterator('Connector'):
    if node.tag == 'Connector':

        address = node.attrib.get('address')
        port = node.attrib.get('port')
        if "EMSNodeMgmtIp" in address and port == "443":
            ciphers = node.attrib.get('ciphers')
            if "RC4" in ciphers:
                # If true, RC4 is enabled somewhere in the cipher suite 
                print "[+] Found RC4 enabled ciphers"

                # Find RC4 specific cipher suite string, for replacement
                elements = ciphers.split()
                search_str = ""
                for element in elements:
                    if "RC4" in element:
                        search_str = element
                        print "[+] Search removal RC4 string: %s" % search_str

                # Replace string by removing RC4 cipher
                print "[+] Removing RC4 cipher"
                replace_str = ciphers.replace(search_str,"")
                rstrip_str = replace_str.rstrip()
                if rstrip_str.endswith(','):
                    new_cipher_str = rstrip_str[:-1]
                    #print new_cipher_str

            node.set('ciphers', new_cipher_str)
tree.write('new.xml')

Upvotes: 0

Views: 150

Answers (3)

Robert
Robert

Reputation: 135

Answer below. Basically had to read each of the Connector sections (there were 4) into a temporary list, to check if port and address are correct. If they are, then make a change to the Cipher by removing cipher string but only if RC4 cipher is enabled. So the code had to read in all of the 4 Connectors, one at a time, into a temporary list.

f = open('template.xml', 'r')
lines = f.readlines()
f.close()

new_file = open('new.xml', 'w')

tmp_list = []
connector = False
for line in lines:
    if '<Connector' in line:
        connector = True
        new_file.write(line)
    elif '</Connector>' in line:
        connector = False
        port = False
        address = False
        for a in tmp_list:
            if 'port="443"' in a:
                port = True
            elif 'address="%(EMSNodeMgmtIp)s"' in a:
                address = True
        if port and address:
            new_list = []
            count = 0
            for b in tmp_list:
                if "RC4" in b:
                    print "[+] Found RC4 cipher suite string at line index %d:  %s" % (count,b) 
                    print "[+] Removing RC4 cipher string from available cipher suites"
                    # check if RC4 cipher string ends with "
                    check = b[:-1]
                    if check.endswith('"'):
                        tmp_str = tmp_list[count-1]
                        tmp_str2 = tmp_str[:-2]
                        tmp_str2+='"\n'
                        new_list[count-1] = tmp_str2
                        replace_line = b.replace(b,"")
                        new_list.append(replace_line)
                    else:
                        replace_line = b.replace(b,"")
                        new_list.append(replace_line)
                else:
                    new_list.append(b)
                count+=1
            for c in new_list:
                new_file.write(c) 
            new_file.write('    </Connector>\n')
        else:
            # Not port and address
            for d in tmp_list:
                new_file.write(d)
            new_file.write('    </Connector>\n')
        tmp_list = []
    elif connector:
        tmp_list.append(line)
    else:
        new_file.write(line)
new_file.close()

Upvotes: 0

Cody Bouche
Cody Bouche

Reputation: 955

I included comments to explain what is going on. inb4downvote

from lxml import etree
import re

xml = '''<?xml version="1.0"?>
<data>
<Connector
protocol="org.apache.coyote.http11.Http11NioProtocol"
port="443"
redirectPort="443"
executor="tomcatThreadPool"
disableUploadTimeout="true"
SSLEnabled="true"
scheme="https"
secure="true"
clientAuth="false"
sslEnabledProtocols="TLSv1,TLSv1.1,TLSv1.2"
keystoreType="JKS"
keystoreFile="tomcat.keystore"
keystorePass="XXXXX"
server="XXXX"
ciphers="TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
         TLS_DH_RSA_WITH_AES_128_CBC_SHA,
         TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
         TLS_DH_DSS_WITH_AES_128_CBC_SHA,
         TLS_RSA_WITH_AES_128_CBC_SHA,
         TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
         TLS_DH_RSA_WITH_3DES_EDE_CBC_SHA,
         TLS_RSA_WITH_3DES_EDE_CBC_SHA,
         TLS_RSA_WITH_RC4_128_SHA"
address="192.168.10.6"></Connector></data>'''

tree = etree.fromstring(xml)
root = tree.getroottree().getroot()
for connector in root.findall('Connector'):
    port = connector.get('port')
    ip = connector.get('address')
    #change this to port/ip you want to remove
    if port != '443' or ip != '192.168.10.6':
        #removes child connector
        connector.getparent().remove(connector)
        continue
    #here we use list comprehension to remove any cipher with "RC4"
    ciphers = ','.join([x for x in re.split(r',\s*', connector.get('ciphers')) if 'RC4' not in x])
    #set the modified cipher back
    connector.set('ciphers', ciphers)
print etree.tostring(root, pretty_print=True)

Upvotes: 1

Prune
Prune

Reputation: 77870

If the XML tools don't preserve the original structure and formatting, dump them. This is a straightforward text-processing problem, and you can write a Python program to handle it.

Spin through the lines of the file; simply echo to the output anything other than a "cipher" statement. When you hit one of those:

  1. Stuff the string into a variable.
  2. Split the string into a list.
  3. Drop any list element containing "RC4".
  4. Print the resulting "cipher" statement in your desired format.
  5. Return to normal "read-and-echo" processing.

Does this algorithm get you going?

Upvotes: 0

Related Questions