methuselah
methuselah

Reputation: 13216

Call variable across different files in Python and dealing with circular imports

I am currently building a docx/pdf converter in Python, and have decided to split up the files into 2 documents:

main.py - controls the program's flow;

convert_to_text.py - contains the function that converts pdf/docx files to txt.

At the moment, I am having difficulties passing the global variable cv across both files and importing all of the functions in convert_to_text.py to use in main.py. This is the error I get:

C:\Python27\python.exe D:/cv-parser/main.py
Traceback (most recent call last):
  File "D:/cv-parser/main.py", line 1, in <module>
    from convert_to_text import *
  File "D:\cv-parser\convert_to_text.py", line 1, in <module>
    from main import cv
  File "D:\cv-parser\main.py", line 5, in <module>
    document_to_text("resources\CV.pdf")
NameError: name 'document_to_text' is not defined

Process finished with exit code 1

How do I fix it?

This is my code so far:

In convert_to_text.py

from main import cv
import docx
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from cStringIO import StringIO


def document_to_text(filename):
    if filename[-5:] == ".docx":
        doc = docx.Document(filename)
        full_text = []
        for para in doc.paragraphs:
            full_text.append(para.text)
        global cv
        cv = '\n'.join(full_text)
        return cv
    elif filename[-4:] == ".pdf":
        return pdf_to_txt(filename)


def pdf_to_txt(file_path):
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    fp = file(file_path, 'rb')
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0
    caching = True
    pagenos = set()
    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password, caching=caching,
                                  check_extractable=True):
        interpreter.process_page(page)
    fp.close()
    device.close()
    str = retstr.getvalue()
    retstr.close()
    global cv
    cv = str
    return cv

In main.py

from convert_to_text import *

cv = 0

document_to_text("resources\CV.pdf")
print cv

Upvotes: 1

Views: 1386

Answers (2)

skrrgwasme
skrrgwasme

Reputation: 9633

You have a circular import that needs resolved. This is your script's execution path:

  1. main.py starts
  2. main.py attempts to import everything from the convert_to_text module, triggering evaluation of convert_to_text.py
  3. convert_to_text.py attempts to import the main module, triggering main.py's evaluation again

But the evaluation of main.py at this point is halted at trying to import the convert_to_text module, so cv doesn't exist yet. When this error is hit, the interpreter bails and throws the NameError exception.

In general, you want to avoid having two scripts import each other. That's called a circular import. There are ways to make circular imports work, but it's more often an indicator of a poorly designed program. I have yet to see an example of a circular import that cannot be refactored into a better organized program.

In this case, the variable cv is used by both of your modules. You have at least two possible solutions:

  1. Fix your function call structure to correctly pass and return arguments. This code doesn't look like it has a real need for cv to be referenced in both modules. See Bryan Oakley's answer for an example. He pointed this out before me.

  2. If you have an unhealthy attachment to the use of a global variable, you can refactor your code to move cv into a third module that both main and convert_to_text can import, resolving the circular import.

Upvotes: 3

Bryan Oakley
Bryan Oakley

Reputation: 386342

If the only thing you need from main.py is cv, you should simply pass cv in to the functions that need it. And since the functions don't actually use cv (other than to set it), there's no reason to do that either.

Your whole problem can be solved by simply removing the import of main, and using the return value of the function to set cv:

cv = document_to_text("resources\CV.pdf")

Upvotes: 3

Related Questions