How to handle new line to a single paragraph python docx

Question

I have a problem when extracting the .docx file from Microsoft Word. I want to handle the data extracted by using python-docx library. When I want to scrap the file. I have a problem that a single paragraph turns into multiple paragraphs, but on my document file, I set it as a single paragraph with Bold text.

This is my script

import io
import re
import csv
import pandas as pd
# import win32com.client
import docx2txt
from docx import Document

industri_minuman = "test.docx"

for idx, para in enumerate(doc_industri_minuman.paragraphs):
    for run in para.runs:
        print(run.text)

This is the result from script

AGA, AMDK / INTAN MULIA, CV
Produk Utama: Air Mineral Aga
Tenaga Kerja: 20 - 99 Orang
Telepon: 82242000000
Email: -
Alamat Pabrik: 
Ds
 
Gunungsari
 
Ds
 
Sumbergondo
,
Kec
 
Glenmore
 Kab Banyuwangi

Expected Result:

AGA, AMDK / INTAN MULIA, CV
Produk Utama: Air Mineral Aga
Tenaga Kerja: 20 - 99 Orang
Telepon: 82242000000
Email: -
Alamat Pabrik: Ds Gunungsari Ds Sumbergondo,
Kec Glenmore Kab Banyuwangi

This is my attached of document file

Link to Docs File

How to handle new line to a single paragraph python docx

Answers (1)

Related Questions