Reputation: 25
I generated xml data in php from mysql, like this:
<?xml version="1.0"?>
<data>
<abc><yearperiod_2024_12>14.522</yearperiod_2024_12></abc>
<abc><yearperiod_2023_04>322</yearperiod_2023_04><yearperiod_2023_11>4.667</yearperiod_2023_11></abc>
<abc><yearperiod_2022_01>7.887</yearperiod_2022_01></abc>
</data>
When import this to excel, the order of years is like in xml, but I would like to see it sequence by years. Is there any possible way to create an xsd (scheme) file that solves my problem?
Upvotes: 0
Views: 39
Reputation: 3581
You can solve this with python:
import pandas as pd
from lxml import etree
xml_string = '''<?xml version="1.0"?>
<data>
<abc><yearperiod_2024_12>14.522</yearperiod_2024_12></abc>
<abc><yearperiod_2023_11>4.667</yearperiod_2023_11><yearperiod_2023_04>322</yearperiod_2023_04></abc>
<abc><yearperiod_2022_01>7.887</yearperiod_2022_01></abc>
</data>'''
# XSLT Transformation (Sorting ABCs and Year Periods)
xslt_string = '''<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/data">
<data>
<xsl:for-each select="abc">
<xsl:sort select="substring(name(*[1]), 11, 4)" data-type="number" order="ascending"/>
<abc>
<xsl:for-each select="*">
<xsl:sort select="substring(name(), 11, 4)" data-type="number" order="ascending"/>
<xsl:sort select="substring(name(), 16, 2)" data-type="number" order="ascending"/>
<xsl:copy-of select="."/>
</xsl:for-each>
</abc>
</xsl:for-each>
</data>
</xsl:template>
</xsl:stylesheet>'''
# Parse XML and XSLT for Transformation
xml_root = etree.fromstring(xml_string)
xslt_root = etree.fromstring(xslt_string.encode("utf-8"))
xslt_transformer = etree.XSLT(xslt_root)
transformed_xml = xslt_transformer(xml_root)
transformed_xml_string = etree.tostring(transformed_xml, pretty_print=True, encoding="utf-8").decode("utf-8")
print("Transformed XML:\n", transformed_xml_string)
# Parse transformed XML
data = []
for abc in transformed_xml.findall("abc"):
for element in abc:
date_str = element.tag.replace("yearperiod_", "").replace("_", "-")
value = float(element.text)
year, month = map(int, date_str.split("-"))
data.append((year, month, date_str, value))
# Sort Data by Year then Month
data.sort(key=lambda x: (x[0], x[1]))
df = pd.DataFrame([(d[2], d[3]) for d in data], columns=["Date", "Value"])
df.to_excel("output.xlsx", index=False, engine="openpyxl")
print("\nFinal DataFrame:")
print(df)
Output Excel file:
Upvotes: 0