Reputation: 59
I have text file in records that look like this
BOOK|100004
TRAN|A
ANAM|Alberta
TNAM|The School Act; the School Assessment Act. The Tax Recovery Act. The School Grants Act. The School Attendance Act and General Regulations of the Department of Education
PBLS|King's Printer
SUB1|Alberta, Canada, Canadian Prairies, NOISBN
i need to create an xml file that has this format,
<BOOK>100004</BOOK>
<TRAN>A</TRAN>
<first 4 chars> text data </ first 4 chars again>
i think i am almost there with a sed command like this,
$sed 's#([:alpha:]\{4\})\|(*)#\<\1\>\2<\/\1\>#g'
except i get this error :- sed: -e expression #1, char 41: invalid reference \1 on
s' command's RHS`
any sed experts want to push me onto an enlightend path?
Upvotes: 1
Views: 530
Reputation: 30580
Sed uses old-style regex, not 'extended' regex, so the default meaning of the special characters is basically the opposite: A capturing group in 'plain' sed is \(...\)
, not (...)
. The same with the escaped |
character: escaping it turns it into alternation. A working sed script looks like:
sed 's#\([^|]\+\)|\(.*\)#<\1>\2</\1>#'
If you want to use extended regex, you can use the -r
flag:
sed -r 's#([^|]+)\|(.*)#<\1>\2</\1>#'
Upvotes: 2