html2text command line breaking html

Question

I'm trying to figure out why html2text is breaking my HTML:

    About   •   Contact   •   Maths Games Order   •   FAQ  
 
 
s Broadbent Maths Ltd
 3 High Street, Welbourn, Lincoln, LN5 0NH

Processing it with:

cat "/home/spider/original-file.txt" | html2text -utf8 -nobs -style pretty

When I run that, I get:

nput recoding failed due to invalid input sequence. Unconverted part of text follows. ▒Contact ▒Maths Games Order ▒FAQ

s Broadbent Maths Ltd 3 High Street, Welbourn, Lincoln, LN5 0NH

When I run Devel::Peek::Dump() (Perl), I see the string as:

SV = PV(0x564c0a72c860) at 0x564c09967c80
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK,UTF8)
  PV = 0x564c0a58bc60 "
    About   •   Contact   •   Maths Games Order   •   FAQ  
 
 
s Broadbent Maths Ltd
 3 High Street, Welbourn, Lincoln, LN5 0NH 
 
"\0 [UTF8 "
    About   •   Contact   •   Maths Games Order   •   FAQ  
 
 
s Broadbent Maths Ltd
 3 High Street, Welbourn, Lincoln, LN5 0NH 
 
"]
  CUR = 725
  LEN = 736
  COW_REFCNT = 1

If I remove the first bit:

It works fine! I don't get why its breaking there though - all seems ok to me?

html2text command line breaking html

Answers (1)

Related Questions