Reputation: 15
I want to parse a large XML File (33000 lines). Following the structure of my xml file:
<?xml version="1.0" encoding="UTF-8"?><Root_2010 xmlns:noNamespaceSchemaLocation="textpool_1.2.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" lang="de-DE">
<Textpool Version="V20.12.08">
<TextpoolList FontFamily="Standard" FontSize="16" FontStyle="normal" FontWeight="bold" SID="S1" TextCharacterLength="0" TextLength="135">
<Text>GlobalCommonTextBook</Text>
</SID_Name>
<TextpoolBlock>
<TextpoolRecord CharacterLengthCheck="Ok" Status="Released" StdTextCharacterLength="4" StdTextLength="???" TID="Txt0_0" TermCheck="NotChecked" TermCheckDescription="NotChecked" TextLengthCheck="Ok" fixed="true">
<IEC translate="no">
<Text/>
</IEC>
<ExplanationText/>
<Text>nein</Text>
</ShortText>
</Description>
<Creator>z0046abb</Creator>
</TextpoolRecord>
</TextpoolBlock>
</TextpoolList>
</Textpool>
</Root_2010>
The element TextpoolList
stores two parts. Its name is stored in the first Text
element. In TextpoolBlock
are several entries stored. The element of interest is again Text
.
I need to parse this file and extract all Text
elements from the specific TextpoolList
to export it into another file. Future prospect is to take advantage of the attributes of TextpoolList
and scan entries added to ShortText
. That's why I want to use some XMLParser.
I decided to give XMLRapid a chance. Since this file is quite large I need to switch some data from stack to heap. Since I don't really know how to do it I am asking you for some help. I tried something alike to https://linuxhint.com/parse_xml_in_c__/.
rapidxml::xml_document<> doc;
rapidxml::xml_node<>* root_node = NULL;
rapidxml::xml_node<>* block_node = NULL;
rapidxml::xml_node<>* record_node = NULL;
rapidxml::xml_node<>* text_node = NULL;
std::ifstream infile(file);
std::string line;
std::string tp_data;
while (std::getline(infile, line))
tp_data += line;
std::vector<char> tp_data_copy(tp_data.begin(), tp_data.end());
tp_data_copy.push_back('\0');
doc.parse<0>(&tp_data_copy[0]);
root_node = doc.first_node("TextpoolList");
for (rapidxml::xml_node<>* textpool_node = root_node->first_node("Textpool"); textpool_node; textpool_node = textpool_node->next_sibling())
{
for (rapidxml::xml_node<>* list_node = textpool_node->first_node("TextpoolList"); list_node; list_node = list_node->next_sibling())
{
for (rapidxml::xml_node<>* block_node = list_node->first_node("TextpoolBlock"); block_node; block_node = block_node->next_sibling())
{
for (rapidxml::xml_node<>* record_node = block_node->first_node("TextpoolRecord"); record_node; record_node = record_node->next_sibling())
{
for (rapidxml::xml_node<>* text_node = record_node->first_node("Text"); text_node; text_node = text_node->next_sibling())
{
std::cout << "record = " << text_node->value();
std::cout << std::endl;
}
std::cout << std::endl;
}
}
}
}
}
Edit: I changed my code in a way I thought the data would land on the heap but I still get the same error to rather store data on the heap instead of the stack.
Thanks for all your ideas!
Upvotes: 0
Views: 561
Reputation: 15
Okay things finally work. This is my routine:
rapidxml::xml_document<> doc;
rapidxml::xml_node<>* root_node = NULL;
rapidxml::xml_node<>* block_node = NULL;
rapidxml::xml_node<>* record_node = NULL;
rapidxml::xml_node<>* text_node = NULL;
rapidxml::xml_node<>* list_node = NULL;
std::ifstream infile(file);
std::string line;
std::string tp_data;
while (std::getline(infile, line))
tp_data += line;
std::vector<char> tp_data_copy(tp_data.begin(), tp_data.end());
tp_data_copy.push_back('\0');
doc.parse<0>(&tp_data_copy[0]);
root_node = doc.first_node("Root_2010");
for (rapidxml::xml_node<>* textpool_node = root_node->first_node("Textpool"); textpool_node; textpool_node = textpool_node->next_sibling())
{
for (rapidxml::xml_node<>* list_node = textpool_node->first_node("TextpoolList"); list_node; list_node = list_node->next_sibling())
{
for (rapidxml::xml_node<>* block_node = list_node->first_node("TextpoolBlock"); block_node; block_node = block_node->next_sibling())
{
for (rapidxml::xml_node<>* record_node = block_node->first_node("TextpoolRecord"); record_node; record_node = record_node->next_sibling())
{
for (rapidxml::xml_node<>* text_node = record_node->first_node("Text"); text_node; text_node = text_node->next_sibling())
{
std::cout << "record = " << text_node->value();
std::cout << std::endl;
}
std::cout << std::endl;
}
}
}
}
If there is some time left I try to find some workaround for that file reading fuckery.
Upvotes: 0