joe Lovick
joe Lovick

Reputation: 59

use Sed to naively convert a text file to xml

I have text file in records that look like this

BOOK|100004
TRAN|A
ANAM|Alberta 
TNAM|The School Act; the School Assessment Act. The Tax Recovery Act. The School Grants         Act. The School Attendance Act and General Regulations of the Department of Education 
PBLS|King's Printer
SUB1|Alberta, Canada, Canadian Prairies, NOISBN

i need to create an xml file that has this format,

<BOOK>100004</BOOK>
<TRAN>A</TRAN>
<first 4 chars> text data </ first 4 chars again>

i think i am almost there with a sed command like this,

$sed 's#([:alpha:]\{4\})\|(*)#\<\1\>\2<\/\1\>#g' 

except i get this error :- sed: -e expression #1, char 41: invalid reference \1 ons' command's RHS`

any sed experts want to push me onto an enlightend path?

Upvotes: 1

Views: 530

Answers (1)

porges
porges

Reputation: 30580

Sed uses old-style regex, not 'extended' regex, so the default meaning of the special characters is basically the opposite: A capturing group in 'plain' sed is \(...\), not (...). The same with the escaped | character: escaping it turns it into alternation. A working sed script looks like:

sed 's#\([^|]\+\)|\(.*\)#<\1>\2</\1>#'

If you want to use extended regex, you can use the -r flag:

sed -r 's#([^|]+)\|(.*)#<\1>\2</\1>#'

Upvotes: 2

Related Questions