Reputation: 456
I have an outline format text document that can have up to 6 levels.
.step1 -- step 1. text
..step2 -- step 1. A. text
..step2 -- step 1. B. text
..step2 -- step 1. C. text
..step2 -- step 1. D. text
..step2 -- step 1. E. text
.step1 -- step 2. text
..step2 -- step 2. A. text
...step3 -- step 2. A. (1) text
...step3 -- step 2. A. (2) text
.step1 -- step 3. text
I'm parsing the text document with regex and building an array that is structured like this
$contentsArray structure
'level' INT - the step level
'type' STRING - the type of tag, note, warning, step1, step2
'line' INT - the line number in the file
'text' STRING - the text
------ SAMPLE ARRAY -------
[0] => Array
(
[level] => 1
[type] => step1
[line] => 8
[text] => Step 1. text
)
[1] => Array
(
[level] => 2
[type] => step2
[line] => 10
[text] => Step 1. A. text
)
[2] => Array
(
[level] => 2
[type] => step2
[line] => 12
[text] => Step 1.B. text
)
[3] => Array
(
[level] => 2
[type] => step2
[line] => 14
[text] => Step 1. C. text.
)
[4] => Array
(
[level] => 2
[type] => step2
[line] => 16
[text] => Step 1. D. text.
)
[5] => Array
(
[level] => 2
[type] => step2
[line] => 18
[text] => Step 1. E. text.
)
[6] => Array
(
[level] => 1
[type] => step1
[line] => 20
[text] => Step 2. text
)
[7] => Array
(
[level] => 2
[type] => step2
[line] => 22
[text] => Step 2. A. Text.
)
[8] => Array
(
[level] => 3
[type] => step3
[line] => 26
[text] => Step 2. A. (1) Text.
)
[9] => Array
(
[level] => 3
[type] => step3
[line] => 28
[text] => Step 3. A. (2) Text.
)
[10] => Array
(
[level] => 1
[type] => step1
[line] => 30
[text] => Step 3. Text
)
The end goal is to turn this into a nested XML document.
<step1>Step 1. text
<step2>Step 1. A. text</step2>
<step2>Step 1. B. text</step2>
<step2>Step 1. C. text</step2>
<step2>Step 1. D. text</step2>
<step2>Step 1. E. text</step2>
</step1>
<step1>Step 2. text
<step2>Step 2. A. text
<step3>Step 2. A. (1) text</step3>
<step3>Step 2. A. (2) text</step3>
</step2>
</step1>
<step1>Step 3. text
</step1>
I think that what I need is to build a nested array that I can then convert into the XML. I think the structure for this array would be something like
[0] => Array
(
[level] => 1
[type] => step1
[line] => 8
[text] => Step 1. text
), Array
(
[level] => 2
[type] => step2
[line] => 10
[text] => Step 1. A. text
), Array
(
[level] => 2
[type] => step2
[line] => 12
[text] => Step 1.B. text
), Array
(
[level] => 2
[type] => step2
[line] => 14
[text] => Step 1. C. text.
), Array
(
[level] => 2
[type] => step2
[line] => 16
[text] => Step 1. D. text.
), Array
(
[level] => 2
[type] => step2
[line] => 18
[text] => Step 1. E. text.
)
)
[1] => Array
(
[level] => 1
[type] => step1
[line] => 20
[text] => Step 2. text
, Array
(
[level] => 2
[type] => step2
[line] => 22
[text] => Step 2. A. Text.
, Array
(
[level] => 3
[type] => step3
[line] => 26
[text] => Step 2. A. (1) Text.
), Array
(
[level] => 3
[type] => step3
[line] => 28
[text] => Step 3. A. (2) Text.
)
)
)
[2] => Array
(
[level] => 1
[type] => step1
[line] => 30
[text] => Step 3. Text
)
What I need is some help on a method to loop through the array that I have built and use the level
value to figure out the nesting in the final array. My attempts have been pretty fruitless to this point. I feel like there is a recursive or iterator way to do this but these aren't my strong suit.
Thanks for the help and I hope this question is clear enough.
UPDATE THE QUESTION I see I did a pretty poor job of asking the question so I have made some edits.
Upvotes: 0
Views: 55
Reputation: 351218
Here is PHP implementation for when you have the input text in a variable $input
:
$stack = [];
foreach(explode("\n", $input . "\n.") as $line) {
$line = trim($line);
$type = ltrim($line, ".");
$dots = strlen($line) - strlen($type);
if (!$dots || $dots > count($stack) + 1) throw new Exception("Bad input format");
while($dots <= count($stack))
$xml[] = str_repeat(" ", count($stack)-1) . array_pop($stack);
$xml[] = str_repeat(" ", count($stack)) . "<$type>";
$stack[] = "</$type>";
}
$xml = implode("\n", array_slice($xml, 0, -1));
echo $xml;
See it run on eval.in.
Upvotes: 1
Reputation: 47020
Your example isn't consistent. Assuming the last ...step3
is a mistake, this will do what you want. It's in perl, but should easily translate.
sub main {
my @labels;
while (<>) {
if (/^(\.+)(.*)$/) {
my $level = length($1);
print "</" . pop(@labels) . ">\n" while $level <= scalar(@labels);
die "Bad input" unless $level == 1 + scalar(@labels);
print "<" . $2 . ">\n";
push @labels, $2
}
}
print "</" . pop(@labels) . ">\n" while (@labels);
}
main;
It's using a simple stack and processing level changes with respect to current stack size.
I'll let you work out the indentation. You can do this based on stack size, too.
Addition
Well, you've changed the question quite a bit. But the basic approach of using a stack and processing level changes with respect to stack length will still work fine.
Upvotes: 1