Reputation: 1
I'm looking for a sed command to clean up some kml files I have. The files are all on a single line and look like this
<some text><kml><Document><name> Name </name><Placemark><name> Hotel 01 </name></Placemark><Placemark><name> Hotel 02 </name></Placemark><Placemark><name> Hotel 03 </name></Placemark></Document></kml>
Ideally I want the only the parts starting with (and including) the first <Placemark>
element to the last (and including) </Placemark>
element and these sections from all the kml files output to a single file.
I'd be happy with a command to either delete all text before the first <Placemark>
and delete all text after last </Placemark>
, or a command to extract the content after the first <Placemark>
and before the last </Placemark>
.
A command that I've managed to botch together so far is:
find . -name 'kmlFiles00*' -exec sed -r 's/^.{879}/ /' {} \; | sed -e 's/<\/Document><\/kml>//g' > placemarks_`date +%d-%m-%Y`.list
which has worked in getting rid of the first 879 characters and then removing all the instances of </Document></kml>
before outputting it all into final file, but this is pretty messy so I'm looking for a cleaner command. I have also tried
sed -e 's/^.*<Placemark> //' -e 's/<\/Placemark>.*$//'
Which I know is getting closer but still fails
Upvotes: 0
Views: 115
Reputation: 58473
This might work for you (GNU sed):
sed -r 's/<Placemark>/\n&/;s/.*\n(.*<\/Placemark>).*/\1/' file
Upvotes: 0
Reputation: 1
awk NF=NF FPAT='<Placemark>.*</Placemark>'
<Placemark>.*</Placemark>
Upvotes: 2