Joshua Park
Joshua Park

Reputation: 31

Is there a way to use OCR to extract specific data from a CAD technical drawing?

I'm trying to use OCR to extract only the base dimensions of a CAD model, but there are other associative dimensions that I don't need (like angles, length from baseline to hole, etc). Here is an example of a technical drawing. (The numbers in red circles are the base dimensions, the rest in purple highlights are the ones to ignore.) How can I tell my program to extract only the base dimensions (the height, length, and width of a block before it goes through the CNC)?

The issue is that the drawings I get are not in a specific format, so I can't tell the OCR where the dimensions are. It has to figure out on its own contextually.

Should I train the program through machine learning by running several iterations and correcting it? If so, what methods are there? The only thing I can think of are Opencv cascade classifiers. Or are there other methods to solving this problem? Sorry for the long post. Thanks.

Upvotes: 0

Views: 1904

Answers (2)

danywigglebutt
danywigglebutt

Reputation: 269

Although a managed offering, Mixpeek is one free option:

pip install mixpeek

from mixpeek import Mixpeek  
  
mix = Mixpeek(  
    api_key="my-api-key"  
)  
  
mix.upload(file_name="design_spec.dwg", file_path="s3://design_spec_1.dwg")

This /upload endpoint will extract the contents of your DWG file, then when you search for terms it will include the file_path so you can render it in your HTML.

Behind the scenes it uses the open source LibreDWG library to run a number of AutoCAD native commands such as DATAEXTRACTION.

Now you can search for a term and the relevant DWG file (in addition to the context in which it exists) will be returned:

mix.search(query="retainer", include_context=True)

[  
    {  
        "file_id": "6377c98b3c4f239f17663d79",  
        "filename": "design_spec.dwg",  
        "context": [  
            {  
                "texts": [  
                    {  
                        "type": "text",  
                        "value": "DV-34-"  
                    },  
                    {  
                        "type": "hit",  
                        "value": "RETAINER"  
                    },  
                    {  
                        "type": "text",  
                        "value": "."  
                    }  
                ]  
            }  
        ],  
        "importance": "100%",  
        "static_file_url": "s3://design_spec_1.dwg"  
    }  
]

More documentation here: https://docs.mixpeek.com/

Upvotes: 0

jmattes
jmattes

Reputation: 36

I feel you... it's a very tricky problem, and we spent the last 3 years finding a solution for it. Forgive me for mentioning the own solution, but it will certainly solve your problem: pip install werk24


from werk24 import Hook, W24AskVariantMeasures
from werk24.models.techread import W24TechreadMessage
from werk24.utils import w24_read_sync
    
from . import get_drawing_bytes # define your own
    
    
def recv_measures(message: W24TechreadMessage) -> None:
    for cur_measure in message.payload_dict.get('measures'):
        print(cur_measure)
    
if __name__ == "__main__":
    # define what information you want to receive from the API
    # and what shall be done when the info is available.
    hooks = [Hook(ask=W24AskVariantMeasures(), function=recv_measures)]
    
    # submit the request to the Werk24 API
    w24_read_sync(get_drawing_bytes(), hooks)

In your example it will return for example the following measure

    {
        "position": <STRIPPED>
        "label": {
            "blurb": "ø30 H7 +0.0210/0",
            "quantity": 1,
            "size": {
                "blurb": "30",
                "size_type":" "DIAMETER",
                "nominal_size": "30.0",
            },
            "unit": "MILLIMETER",
            "size_tolerance": {
                "toleration_type": "FIT_SIZE_ISO",
                "blurb": "H7",
                "deviation_lower": "0.0",
                "deviation_upper": "0.0210",
                "fundamental_deviation": "H",
                "tolerance_grade": {
                    "grade":7,
                    "warnings":[]
                },
            "thread": null,
            "chamfer": null,
            "depth":null,
            "test_dimension": null,
         },
         "warnings": [],
         "confidence": 0.98810
    }

or for a GD&T

{
    "position": <STRIPPED>,
    "frame": {
        "blurb": "[⟂|0.05|A]",
        "characteristic": "⟂",
        "zone_shape": null,
        "zone_value": {
            "blurb": "0.05",
            "width_min": 0.05,
            "width_max": null,
            "extend_quantity": null,
            "extend_shape": null,
            "extend": null,
            "extend_angle": null
        },
        "zone_combinations": [],
        "zone_offset": null,
        "zone_constraint": null,
        "feature_filter": null,
        "feature_associated": null,
        "feature_derived": null,
        "reference_association": null,
        "reference_parameter": null,
        "material_condition": null,
        "state": null,
        "data": [
            {
                "blurb": "A"
            }
         ]
    }
}

Check the documentation on Werk24 for details.

Upvotes: 1

Related Questions