Reputation: 539
I am attempting to reformat a hierarchical (xml) file to a "per line" file using vim.
Here is a simplified example. The real case is "large" (500k lines) and entries and groups are arbitrary counts.
input file:
<group key="abc">
<entry val="1"/>
<entry val="2"/>
<entry val="3"/>
</group>
<group key="xyz">
<entry val="1"/>
<entry val="2"/>
<entry val="3"/>
<entry val="4"/>
<entry val="5"/>
</group>
output result:
abc,1
abc,2
abc,3
xyz,1
xyz,2
xyz,3
xyz,4
xyz,5
Please note that I don't need a single magic expression that does all of this (although that'd be swell). The part I am struggling with is getting the key associated with each of the entries. I'm sure there is a good idiom for handling this. Thanks in advance.
One thing I tried and may be useful to others is as follows:
:g/key="\(.*\)"/.;/<\/group/s/<entry /\1,<entry /g
which does not work because the range match is not carried over to the substitution. This expression essentially looks for pat1, builds a range from there to pat2, then substitutes pat3 with pat4 (but only within instances of the pat1,pat2 range inclusive).
:g/pat1/.;/pat2/s/pat3/pat4/g
Solution
The best solution below solved it by looking for the entry and then backwards for the key, as opposed to what I was trying to do above by building a range and multiple substitutions. What finally worked required some minor modifications, so they are provided here for others. The commands that do the heavy lifting are:
:g/entry/?key?,\?t.
:g/entry/norm ddpkJ
:v/entry/d
Breakdown:
Search for all the entry lines:
:g/entry/
From there, search backwards for the the line that has the key and copy it below each entry.
?key?,\?t.
Search for all entry lines again, and switch to normal mode editing
:g/entry/norm
Swap the two lines (delete key line and paste it below the group line). Move up to the key line and join the two lines.
ddpkJ
Once all keys are mapped, search for any lines that do NOT have an entry and delete them.
:v/entry/d
If you have multiple hierarchies as I do, you can run the first two lines multiple times. Once everything is on a single line, it is fairly straightforward to clean it up into whatever final format is needed. Another major benefit is that this solution can be put in a script easily and rerun with
vim -S script.vim data.file
Upvotes: 3
Views: 533
Reputation: 58431
Following would work
:g/entry/?<group?,?<group?t.
:%norm J
:g/<\//d
:%norm df"f"df"i,<C-v><Esc>f"d$
Breakdown
For each line containing entry
, search backwards for <group
and copy to the line below entry
:g/entry/?<group?,?<group?t.
<group key="abc">
<entry val="1"/>
<group key="abc">
<entry val="2"/>
<group key="abc">
<entry val="3"/>
<group key="abc">
</group>
<group key="xyz">
<entry val="1"/>
<group key="xyz">
<entry val="2"/>
<group key="xyz">
<entry val="3"/>
<group key="xyz">
<entry val="4"/>
<group key="xyz">
<entry val="5"/>
<group key="xyz">
</group>
Join all lines
:%norm J
<group key="abc"> <entry val="1"/>
<group key="abc"> <entry val="2"/>
<group key="abc"> <entry val="3"/>
<group key="abc"> </group>
<group key="xyz"> <entry val="1"/>
<group key="xyz"> <entry val="2"/>
<group key="xyz"> <entry val="3"/>
<group key="xyz"> <entry val="4"/>
<group key="xyz"> <entry val="5"/>
<group key="xyz"> </group>
Remove the closing tags
:g/<\//d
<group key="abc"> <entry val="1"/>
<group key="abc"> <entry val="2"/>
<group key="abc"> <entry val="3"/>
<group key="xyz"> <entry val="1"/>
<group key="xyz"> <entry val="2"/>
<group key="xyz"> <entry val="3"/>
<group key="xyz"> <entry val="4"/>
<group key="xyz"> <entry val="5"/>
Fixup the remaining text by searching and deleting to and from quotes. Note that <C-v><Esc>
is the key sequence to add an escape in your command.
:%norm df"f"df"i,<C-v><Esc>f"d$
abc,1
abc,2
abc,3
xyz,1
xyz,2
xyz,3
xyz,4
xyz,5
Upvotes: 1
Reputation: 5947
Well, this is not a magical one line but might work:
ggqq/groupf"lyi"<c-v>n0I<c-r>"<esc>ddnddq
100@q
:%s/\s*<entry val="/,/g
:%s/"\/>//g
Step by step:
gg => Go to the top
qq => Record a macro called q
/group => Search for "group"
f"l => Go to the key
yi" => Copy the key
c-v => Vertical visual mode
n0 => Go to the end of the "group", place the cursor at the beginning
I<c-r>"<esc> => Paste at the beginning
dd => Delete <group> line
ndd => Delete end </group> line
q => Stop macro
100@q => Play macro 100 times, use whatever you need
Now you should have something like:
abc <entry val="1"/>
abc <entry val="2"/>
abc <entry val="3"/>
xyz <entry val="1"/>
xyz <entry val="2"/>
xyz <entry val="3"/>
xyz <entry val="4"/>
xyz <entry val="5"/>
Then just clean what you don't need:
:%s/\s*<entry val="/,/g
:%s/"\/>//g
Upvotes: 1