Reputation: 685
Good day SO,
I am trying to copy a part of a .docx file into another .docx file, while keeping the formatting of the copied part, as well as any images, using python.
I have tried python-docx but i am unable to find anything regarding images. Link to my previous qn here: Extracting .docx data, images and structure
Is there a way for me to copy a part of a document, lets say DocA, and insert it into the ending of DocB (Including images and formatting, basically a clean copy and paste situation)?
Thanks alot!
EDIT: I have managed to find paragraphs containing images in DocA using the following code. I understand that it is a very hack-ish way as I am a complete beginner in python-docx, but here it is:
for x in document.paragraphs:
if "<w:pict" in x._p.xml:
print(x._p.xml)
Using this code, I successfully managed to find paragraphs containing the said images in the document. However, I am still unable to copy the image over to DocB (It appears as blanks in DocB), which is because (based on my understanding) I didn't extract the image data from the .docx file DocA.
EDIT 2:
Here is the XML of the Paragraph object containing the images:
<w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" w14:paraId="18A83B04" w14:textId="77777777" w:rsidR="00200C54" w:rsidRDefault="00051C61" w:rsidP="00200C54">
<w:pPr>
<w:jc w:val="center"/>
</w:pPr>
<w:r>
<w:rPr>
<w:noProof/>
</w:rPr>
<w:pict w14:anchorId="30C19523">
<v:shapetype id="_x0000_t202" coordsize="21600,21600" o:spt="202" path="m,l,21600r21600,l21600,xe">
<v:stroke joinstyle="miter"/>
<v:path gradientshapeok="t" o:connecttype="rect"/>
</v:shapetype>
<v:shape id="Text Box 2" o:spid="_x0000_s1029" type="#_x0000_t202" style="position:absolute;left:0;text-align:left;margin-left:305.1pt;margin-top:112.75pt;width:86.25pt;height:19.5pt;z-index:1;visibility:visible;mso-wrap-distance-top:3.6pt;mso-wrap-distance-bottom:3.6pt;mso-width-relative:margin;mso-height-relative:margin" o:gfxdata="UEsDBBQABgAIAAAAIQC2gziS/gAAAOEBAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbJSRQU7DMBBF 90jcwfIWJU67QAgl6YK0S0CoHGBkTxKLZGx5TGhvj5O2G0SRWNoz/78nu9wcxkFMGNg6quQqL6RA 0s5Y6ir5vt9lD1JwBDIwOMJKHpHlpr69KfdHjyxSmriSfYz+USnWPY7AufNIadK6MEJMx9ApD/oD OlTrorhX2lFEilmcO2RdNtjC5xDF9pCuTyYBB5bi6bQ4syoJ3g9WQ0ymaiLzg5KdCXlKLjvcW893 SUOqXwnz5DrgnHtJTxOsQfEKIT7DmDSUCaxw7Rqn8787ZsmRM9e2VmPeBN4uqYvTtW7jvijg9N/y JsXecLq0q+WD6m8AAAD//wMAUEsDBBQABgAIAAAAIQA4/SH/1gAAAJQBAAALAAAAX3JlbHMvLnJl bHOkkMFqwzAMhu+DvYPRfXGawxijTi+j0GvpHsDYimMaW0Yy2fr2M4PBMnrbUb/Q94l/f/hMi1qR JVI2sOt6UJgd+ZiDgffL8ekFlFSbvV0oo4EbChzGx4f9GRdb25HMsYhqlCwG5lrLq9biZkxWOiqY 22YiTra2kYMu1l1tQD30/bPm3wwYN0x18gb45AdQl1tp5j/sFB2T0FQ7R0nTNEV3j6o9feQzro1i OWA14Fm+Q8a1a8+Bvu/d/dMb2JY5uiPbhG/ktn4cqGU/er3pcvwCAAD//wMAUEsDBBQABgAIAAAA IQBxa2hSIQIAAB0EAAAOAAAAZHJzL2Uyb0RvYy54bWysU11v2yAUfZ+0/4B4X+x4cdNYcaouXaZJ 3YfU7gdgjGM04DIgsbtfvwtO06h7m+YHxPW9HM4997C+GbUiR+G8BFPT+SynRBgOrTT7mv543L27 psQHZlqmwIiaPglPbzZv36wHW4kCelCtcARBjK8GW9M+BFtlmee90MzPwAqDyQ6cZgFDt89axwZE 1yor8vwqG8C11gEX3uPfuylJNwm/6wQP37rOi0BUTZFbSKtLaxPXbLNm1d4x20t+osH+gYVm0uCl Z6g7Fhg5OPkXlJbcgYcuzDjoDLpOcpF6wG7m+atuHnpmReoFxfH2LJP/f7D86/G7I7KtaTFfUmKY xiE9ijGQDzCSIuozWF9h2YPFwjDib5xz6tXbe+A/PTGw7ZnZi1vnYOgFa5HfPJ7MLo5OOD6CNMMX aPEadgiQgMbO6SgeykEQHef0dJ5NpMLjlfmqfL8sKeGYKxbLqzINL2PV82nrfPgkQJO4qanD2Sd0 drz3IbJh1XNJvMyDku1OKpUCt2+2ypEjQ5/s0pcaeFWmDBlquiqLMiEbiOeThbQM6GMldU2v8/hN zopqfDRtKglMqmmPTJQ5yRMVmbQJYzNiYdSsgfYJhXIw+RXfF256cL8pGdCrNfW/DswJStRng2Kv 5otFNHcKFuWywMBdZprLDDMcoWoaKJm225AeRNTBwC0OpZNJrxcmJ67owSTj6b1Ek1/GqerlVW/+ AAAA//8DAFBLAwQUAAYACAAAACEAiK7BRuMAAAAQAQAADwAAAGRycy9kb3ducmV2LnhtbExPy26D MBC8V+o/WFupl6oxQQESgon6UKtek+YDDN4ACl4j7ATy992e2stKuzM7j2I3215ccfSdIwXLRQQC qXamo0bB8fvjeQ3CB01G945QwQ097Mr7u0Lnxk20x+shNIJFyOdaQRvCkEvp6xat9gs3IDF2cqPV gdexkWbUE4vbXsZRlEqrO2KHVg/41mJ9PlysgtPX9JRspuozHLP9Kn3VXVa5m1KPD/P7lsfLFkTA Ofx9wG8Hzg8lB6vchYwXvYJ0GcVMVRDHSQKCGdk6zkBUfElXCciykP+LlD8AAAD//wMAUEsBAi0A FAAGAAgAAAAhALaDOJL+AAAA4QEAABMAAAAAAAAAAAAAAAAAAAAAAFtDb250ZW50X1R5cGVzXS54 bWxQSwECLQAUAAYACAAAACEAOP0h/9YAAACUAQAACwAAAAAAAAAAAAAAAAAvAQAAX3JlbHMvLnJl bHNQSwECLQAUAAYACAAAACEAcWtoUiECAAAdBAAADgAAAAAAAAAAAAAAAAAuAgAAZHJzL2Uyb0Rv Yy54bWxQSwECLQAUAAYACAAAACEAiK7BRuMAAAAQAQAADwAAAAAAAAAAAAAAAAB7BAAAZHJzL2Rv d25yZXYueG1sUEsFBgAAAAAEAAQA8wAAAIsFAAAAAA== " stroked="f">
<v:textbox>
<w:txbxContent>
<w:p w14:paraId="467DC1DB" w14:textId="77777777" w:rsidR="00200C54" w:rsidRDefault="00200C54" w:rsidP="00200C54">
<w:pPr>
<w:jc w:val="center"/>
</w:pPr>
<w:r>
<w:t>tLSTM</w:t>
</w:r>
</w:p>
</w:txbxContent>
</v:textbox>
</v:shape>
</w:pict>
</w:r>
<w:r>
<w:rPr>
<w:noProof/>
</w:rPr>
<w:pict w14:anchorId="0D832600">
<v:line id="Straight Connector 8" o:spid="_x0000_s1028" style="position:absolute;left:0;text-align:left;flip:y;z-index:2;visibility:visible;mso-width-relative:margin;mso-height-relative:margin" from="205.4pt,44.35pt" to="249.05pt,45.55pt" o:gfxdata="UEsDBBQABgAIAAAAIQC2gziS/gAAAOEBAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbJSRQU7DMBBF 90jcwfIWJU67QAgl6YK0S0CoHGBkTxKLZGx5TGhvj5O2G0SRWNoz/78nu9wcxkFMGNg6quQqL6RA 0s5Y6ir5vt9lD1JwBDIwOMJKHpHlpr69KfdHjyxSmriSfYz+USnWPY7AufNIadK6MEJMx9ApD/oD OlTrorhX2lFEilmcO2RdNtjC5xDF9pCuTyYBB5bi6bQ4syoJ3g9WQ0ymaiLzg5KdCXlKLjvcW893 SUOqXwnz5DrgnHtJTxOsQfEKIT7DmDSUCaxw7Rqn8787ZsmRM9e2VmPeBN4uqYvTtW7jvijg9N/y JsXecLq0q+WD6m8AAAD//wMAUEsDBBQABgAIAAAAIQA4/SH/1gAAAJQBAAALAAAAX3JlbHMvLnJl bHOkkMFqwzAMhu+DvYPRfXGawxijTi+j0GvpHsDYimMaW0Yy2fr2M4PBMnrbUb/Q94l/f/hMi1qR JVI2sOt6UJgd+ZiDgffL8ekFlFSbvV0oo4EbChzGx4f9GRdb25HMsYhqlCwG5lrLq9biZkxWOiqY 22YiTra2kYMu1l1tQD30/bPm3wwYN0x18gb45AdQl1tp5j/sFB2T0FQ7R0nTNEV3j6o9feQzro1i OWA14Fm+Q8a1a8+Bvu/d/dMb2JY5uiPbhG/ktn4cqGU/er3pcvwCAAD//wMAUEsDBBQABgAIAAAA IQBOa9I08gEAAMcDAAAOAAAAZHJzL2Uyb0RvYy54bWysU02P0zAQvSPxHyzfadqyQUvUdA+tlssK KrVwn3XsxMJf8pim+feM3dIWuCFysGzPvJeZN8+rp5M17Cgjau9avpjNOZNO+E67vuVfD8/vHjnD BK4D451s+SSRP63fvlmNoZFLP3jTyciIxGEzhpYPKYWmqlAM0gLOfJCOgspHC4mOsa+6CCOxW1Mt 5/MP1ehjF6IXEpFut+cgXxd+paRIX5RCmZhpOdWWyhrL+prXar2Cpo8QBi0uZcA/VGFBO/rplWoL CdiPqP+islpEj16lmfC28kppIUsP1M1i/kc3+wGCLL2QOBiuMuH/oxWfj7vIdNdyGpQDSyPapwi6 HxLbeOdIQB/ZY9ZpDNhQ+sbtYu5UnNw+vHjxHZnzmwFcL0u9hykQySIjqt8g+YDhDD6paJkyOnzL qZmOpGCnMpfpOhd5SkzQZV0/vK9rzgSFFvXyoYytgiazZGyImD5Jb1netNxol1WDBo4vmHIdt5R8 7fyzNqZM3jg2EufHeU3mEEAGVAYSbW0gSdD1nIHpydkixUKJ3uguwzMRTrgxkR2BzEWe7Px4oJI5 M4CJAtRH+YoUlH0PzZVuAYczuKPd2YpWJ3oPRlsayD3YuPxDWRx9aeqmZ969+m7axV+ik1tK2xdn Zzven8tobu9v/RMAAP//AwBQSwMEFAAGAAgAAAAhAMrmE8fjAAAADgEAAA8AAABkcnMvZG93bnJl di54bWxMj8FOwzAQRO9I/IO1SNyoY1TaJI1TIapyRLRw4ebGJomw15HtNIGvZzmVy0qj3Z15U21n Z9nZhNh7lCAWGTCDjdc9thLe3/Z3ObCYFGplPRoJ3ybCtr6+qlSp/YQHcz6mlpEJxlJJ6FIaSs5j 0xmn4sIPBmn36YNTiWRouQ5qInNn+X2WrbhTPVJCpwbz1Jnm6zg6CZN9Xj3oYjh87HkQ69efUePu Rcrbm3m3ofG4AZbMnC4f8NeB+KEmsJMfUUdmJSxFRvxJQp6vgdHBssgFsJOEQgjgdcX/16h/AQAA //8DAFBLAQItABQABgAIAAAAIQC2gziS/gAAAOEBAAATAAAAAAAAAAAAAAAAAAAAAABbQ29udGVu dF9UeXBlc10ueG1sUEsBAi0AFAAGAAgAAAAhADj9If/WAAAAlAEAAAsAAAAAAAAAAAAAAAAALwEA AF9yZWxzLy5yZWxzUEsBAi0AFAAGAAgAAAAhAE5r0jTyAQAAxwMAAA4AAAAAAAAAAAAAAAAALgIA AGRycy9lMm9Eb2MueG1sUEsBAi0AFAAGAAgAAAAhAMrmE8fjAAAADgEAAA8AAAAAAAAAAAAAAAAA TAQAAGRycy9kb3ducmV2LnhtbFBLBQYAAAAABAAEAPMAAABcBQAAAAA= " strokecolor="windowText" strokeweight="1.5pt">
<v:stroke dashstyle="dash" joinstyle="miter"/>
</v:line>
</w:pict>
</w:r>
<w:r>
<w:rPr>
<w:noProof/>
</w:rPr>
<w:pict w14:anchorId="7B559002">
<v:line id="Straight Connector 9" o:spid="_x0000_s1027" style="position:absolute;left:0;text-align:left;z-index:3;visibility:visible;mso-width-relative:margin;mso-height-relative:margin" from="203.6pt,47.3pt" to="249.65pt,114pt" o:gfxdata="UEsDBBQABgAIAAAAIQC2gziS/gAAAOEBAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbJSRQU7DMBBF 90jcwfIWJU67QAgl6YK0S0CoHGBkTxKLZGx5TGhvj5O2G0SRWNoz/78nu9wcxkFMGNg6quQqL6RA 0s5Y6ir5vt9lD1JwBDIwOMJKHpHlpr69KfdHjyxSmriSfYz+USnWPY7AufNIadK6MEJMx9ApD/oD OlTrorhX2lFEilmcO2RdNtjC5xDF9pCuTyYBB5bi6bQ4syoJ3g9WQ0ymaiLzg5KdCXlKLjvcW893 SUOqXwnz5DrgnHtJTxOsQfEKIT7DmDSUCaxw7Rqn8787ZsmRM9e2VmPeBN4uqYvTtW7jvijg9N/y JsXecLq0q+WD6m8AAAD//wMAUEsDBBQABgAIAAAAIQA4/SH/1gAAAJQBAAALAAAAX3JlbHMvLnJl bHOkkMFqwzAMhu+DvYPRfXGawxijTi+j0GvpHsDYimMaW0Yy2fr2M4PBMnrbUb/Q94l/f/hMi1qR JVI2sOt6UJgd+ZiDgffL8ekFlFSbvV0oo4EbChzGx4f9GRdb25HMsYhqlCwG5lrLq9biZkxWOiqY 22YiTra2kYMu1l1tQD30/bPm3wwYN0x18gb45AdQl1tp5j/sFB2T0FQ7R0nTNEV3j6o9feQzro1i OWA14Fm+Q8a1a8+Bvu/d/dMb2JY5uiPbhG/ktn4cqGU/er3pcvwCAAD//wMAUEsDBBQABgAIAAAA IQDiYn9F7wEAAL4DAAAOAAAAZHJzL2Uyb0RvYy54bWysU02P0zAQvSPxHyzfadJlC23UdA+tlssK KrX8gFnHSSz8JY9pkn/P2P3YAjdEDtbY43me9+Zl/TQazU4yoHK25vNZyZm0wjXKdjX/fnz+sOQM I9gGtLOy5pNE/rR5/249+Eo+uN7pRgZGIBarwde8j9FXRYGilwZw5ry0lGxdMBBpG7qiCTAQutHF Q1l+KgYXGh+ckIh0ujsn+Sbjt60U8VvbooxM15x6i3kNeX1Na7FZQ9UF8L0SlzbgH7owoCw9eoPa QQT2M6i/oIwSwaFr40w4U7i2VUJmDsRmXv7B5tCDl5kLiYP+JhP+P1jx9bQPTDU1X3FmwdCIDjGA 6vrIts5aEtAFtko6DR4rur61+5CYitEe/IsTP5BZt+3BdjL3e5w8gcxTRfFbSdqgPxePbTAJhARg Y57GdJuGHCMTdLhYPi4/LjgTlFo+fi5XeVoFVNdiHzB+kc6wFNRcK5vEggpOLxjT81Bdr6Rj656V 1nng2rKBelyVC/KEAPJdqyFSaDwpgbbjDHRHhhYxZEh0WjWpPAHhhFsd2AnIU2TFxg1H6pkzDRgp QUTylxWg2/elqZ8dYH8ubig6O9CoSL+BVoao3hdrmx6U2cgXUm8ypujVNdM+XLUmk2TaF0MnF97v 80TefrvNLwAAAP//AwBQSwMEFAAGAAgAAAAhAMoUjjTgAAAADwEAAA8AAABkcnMvZG93bnJldi54 bWxMT0tOwzAQ3SNxB2sqsaN2TVSaNE6FIFRiSekBpvGQRI3tKHY+vT1mBZuRnuZ988NiOjbR4Ftn FWzWAhjZyunW1grOX++PO2A+oNXYOUsKbuThUNzf5ZhpN9tPmk6hZtHE+gwVNCH0Gee+asigX7ue bPx9u8FgiHCouR5wjuam41KILTfY2pjQYE+vDVXX02gUmCo9jjSV5VGeb3zm/fWjwVKph9Xyto/n ZQ8s0BL+FPC7IfaHIha7uNFqzzoFiXiWkaogTbbAIiFJ0ydgFwVS7gTwIuf/dxQ/AAAA//8DAFBL AQItABQABgAIAAAAIQC2gziS/gAAAOEBAAATAAAAAAAAAAAAAAAAAAAAAABbQ29udGVudF9UeXBl c10ueG1sUEsBAi0AFAAGAAgAAAAhADj9If/WAAAAlAEAAAsAAAAAAAAAAAAAAAAALwEAAF9yZWxz Ly5yZWxzUEsBAi0AFAAGAAgAAAAhAOJif0XvAQAAvgMAAA4AAAAAAAAAAAAAAAAALgIAAGRycy9l Mm9Eb2MueG1sUEsBAi0AFAAGAAgAAAAhAMoUjjTgAAAADwEAAA8AAAAAAAAAAAAAAAAASQQAAGRy cy9kb3ducmV2LnhtbFBLBQYAAAAABAAEAPMAAABWBQAAAAA= " strokecolor="windowText" strokeweight="1.5pt">
<v:stroke dashstyle="dash" joinstyle="miter"/>
</v:line>
</w:pict>
</w:r>
<w:r>
<w:rPr>
<w:noProof/>
</w:rPr>
<w:pict w14:anchorId="1C829DE8">
<v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/>
<v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/>
</v:formulas>
<v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
<o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype>
<v:shape id="Picture 6" o:spid="_x0000_s1026" type="#_x0000_t75" style="position:absolute;left:0;text-align:left;margin-left:247.8pt;margin-top:10pt;width:186pt;height:112.15pt;z-index:-1;visibility:visible" wrapcoords="17332 576 17332 2880 3571 3024 1742 3312 1742 5184 348 5328 348 6192 1742 7488 1742 9792 871 12096 871 12816 1481 14400 1742 16704 261 16848 261 18000 2439 19008 2613 21456 3135 21456 5661 21312 18726 19440 19945 19008 21426 17712 21339 16704 19510 14400 19510 7488 20816 7200 20816 5472 19510 4752 18639 3456 17855 2880 18639 2016 18639 864 17768 576 17332 576">
<v:imagedata r:id="rId8" o:title=""/>
<w10:wrap type="tight"/>
</v:shape>
</w:pict>
</w:r>
<w:r w:rsidR="00524183">
<w:rPr>
<w:noProof/>
<w:lang w:val="en-US"/>
</w:rPr>
<w:pict w14:anchorId="63A496C5">
<v:shape id="Picture 5" o:spid="_x0000_i1025" type="#_x0000_t75" style="width:191.7pt;height:128.1pt;visibility:visible">
<v:imagedata r:id="rId9" o:title=""/>
</v:shape>
</w:pict>
</w:r>
</w:p>
The images are in the docx file, but do not show up in document.inline_shapes (python-docx), hence I have no idea how to continue.. any help appreciated :)
Upvotes: 3
Views: 3250
Reputation: 718
Check this code. You can extract image position between two texts and image name by:
tags = []
text = []
for t in doc.element.getiterator():
if t.tag in ['{http://schemas.openxmlformats.org/wordprocessingml/2006/main}r', '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t','{http://schemas.openxmlformats.org/drawingml/2006/picture}cNvPr','{http://schemas.openxmlformats.org/wordprocessingml/2006/main}drawing']:
if t.tag == '{http://schemas.openxmlformats.org/drawingml/2006/picture}cNvPr':
print('Picture Found: ',t.attrib['name'])
tags.append('Picture')
text.append(t.attrib['name'])
elif t.text:
tags.append('text')
text.append(t.text)
You can check previous and next text from text list and their tag from the tag list.
If you have extracted the image location and image name then you can add the image in your docx file by this code
from docx import Document
document = Document()
p = document.add_paragraph()
r = p.add_run()
r.add_text('Good Morning every body,This is my ')
r.add_picture('/tmp/foo.jpg')
r.add_text(' do you like it?')
document.save('demo.docx')
You can access the image by unzipping your docx file. when you will unzip you will get different folders. You can access all the images in the file from word/media
folder
Check this link for unzipping a docx file https://towardsdatascience.com/how-to-extract-data-from-ms-word-documents-using-python-ed3fbb48c122
Upvotes: 1
Reputation: 718
Check this code you can identify the location of an image after a specific text:
tags = []
for t in document.element.getiterator():
if t.tag in ['{http://schemas.openxmlformats.org/wordprocessingml/2006/main}r','{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t','{http://schemas.openxmlformats.org/wordprocessingml/2006/main}drawing']:
if t.tag == '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}drawing':
print('Picture Found')
else:
print(t.text)
Upvotes: 1
Reputation: 926
This may not be a direct answer to your question, but it is worth considering.
If you have control over docA
, have you considered the use of a docx template? In my problem I needed to generate reports from a template, so I had to copy information from python variables into a document, to generate a report. I found this project library which does replacement: https://github.com/elapouya/python-docx-template
Finally, you can replace the content from your variables like this:
from docxtpl import DocxTemplate
doc = DocxTemplate("my_word_template.docx")
context = { 'company_name' : "World company" }
doc.render(context)
doc.save("generated_doc.docx")
I have not checked but I believe this does preserve formatting. Here is an example of what my template looked like before replacing variables:
Upvotes: 1