String Manipulation - Is it possible to remove unpredicted extra spaces in output text from Google Cloud Vision OCR?

Question

Currently there are unpredicted extra spaces as the two samples below (OCR result); And Google did not complete fix it at this time;
Therefore, we're looking for post processing (Do String Manipulation from OCR result to Expected result).
However, from my experience, I don't see logic to cover all of the unpredicted extra spaces.
Could you help to suggest and correct me, please? Thank you very much.

Sample 1:
OCR result: NATURAL VITAMIN E 400 MEGA CAPSULE 400 IU รับ ประทาน ครั้ง ละ 1 เม็ด วัน ละ 1 ครั้ง หลัง อาหาร เช้า ยา นี้ หมด อายุ ภายใน 1 ปี นับ จาก วัน ที่ ได้ รับ
Expected result (If see by human eyes): NATURAL VITAMIN E 400 MEGA CAPSULE 400 IU รับประทาน ครั้งละ 1 เม็ด วันละ 1 ครั้ง หลังอาหารเช้า ยานี้หมดอายุภายใน 1 ปีนับจากวันที่ได้รับ
Sample 2:
OCR result: MOLAX - M TABLET 10 MG รับ ประทาน ครั้ง ละ 1 เม็ด วัน ละ 3 ครั้ง ก่อน อาหาร เช้า กลาง วัน เย็น ยา แก้ คลื่นไส้ อาเจียน ปรับ การ บีบ ตัว ของ ทาง เดิน อาหาร ควร รับ ประทาน ยา นี้ ก่อน อาหาร ครึ่ง ชั่วโมง
Expected result (If see by human eyes): MOLAX - M TABLET 10 MG รับประทาน ครั้งละ 1 เม็ด วันละ 3 ครั้ง ก่อนอาหารเช้า กลางวัน เย็น ยาแก้คลื่นไส้อาเจียน ปรับการบีบตัวของทางเดินอาหาร ควรรับประทานยานี้ก่อนอาหารครึ่งชั่วโมง

String Manipulation - Is it possible to remove unpredicted extra spaces in output text from Google Cloud Vision OCR?

Answers (1)

Related Questions