Reputation: 1

Sed Command delete before first instance and after last

I'm looking for a sed command to clean up some kml files I have. The files are all on a single line and look like this

<some text><kml><Document><name> Name </name><Placemark><name> Hotel 01 </name></Placemark><Placemark><name> Hotel 02 </name></Placemark><Placemark><name> Hotel 03 </name></Placemark></Document></kml>

Ideally I want the only the parts starting with (and including) the first <Placemark> element to the last (and including) </Placemark> element and these sections from all the kml files output to a single file.

I'd be happy with a command to either delete all text before the first <Placemark> and delete all text after last </Placemark>, or a command to extract the content after the first <Placemark> and before the last </Placemark>.

A command that I've managed to botch together so far is:

find . -name 'kmlFiles00*' -exec sed -r 's/^.{879}/ /' {} \; | sed -e 's/<\/Document><\/kml>//g' > placemarks_`date +%d-%m-%Y`.list

which has worked in getting rid of the first 879 characters and then removing all the instances of </Document></kml> before outputting it all into final file, but this is pretty messy so I'm looking for a cleaner command. I have also tried

sed -e 's/^.*<Placemark> //' -e 's/<\/Placemark>.*$//'

Which I know is getting closer but still fails

Upvotes: 0

Answers (2)

potong

Reputation: 58473

This might work for you (GNU sed):

sed -r 's/<Placemark>/\n&/;s/.*\n(.*<\/Placemark>).*/\1/' file

Upvotes: 0

Zombo

Reputation: 1

awk NF=NF FPAT='<Placemark>.*</Placemark>'

define a field as being <Placemark>.*</Placemark>
force rebuild of the line, printing all fields

Upvotes: 2

Sed Command delete before first instance and after last

Answers (2)

Related Questions