Automated PMI Extraction from Technical Drawings

📐

AI-Powered Digitalization of Engineering Documentation

Project Introduction

In the manufacturing and engineering world, PMI (Product Manufacturing Information) is critical metadata embedded in technical drawings. PMI includes dimensions, tolerances, geometric dimensioning and tolerancing (GD&T) symbols, surface finish specifications, material callouts, and manufacturing notes—essentially all the information needed to manufacture a part correctly.

The Business Problem: Large manufacturing companies often outsource CAD (Computer-Aided Design) work to third-party vendors. When a tender is awarded, these CAD companies receive legacy technical drawings (often scanned PDFs or images) and must recreate digital 3D models with accurate PMI information for subparts. This process is traditionally:

•Extremely time-consuming: Engineers manually interpret drawings and re-enter PMI data
•Error-prone: Human mistakes in reading dimensions or symbols lead to costly manufacturing defects
•Expensive: Requires highly skilled CAD engineers for routine data entry work
•Bottleneck: Delays entire product development and manufacturing cycles
•Non-scalable: Cannot handle large volumes of legacy drawings efficiently

The solution: An AI-powered PMI extraction system that automatically reads technical drawings, detects and extracts PMI symbols and labels using machine learning, post-processes with intelligent regex and database matching, and provides a vendor-friendly SDK for seamless integration into existing CAD workflows.

80%

AI Detection Accuracy

Azure ML

Cloud Training

Near 100%

Post-Processed Accuracy

SDK

Vendor Integration

Understanding PMI & Technical Drawings

PMI (Product Manufacturing Information) is the collection of annotations and symbols that communicate design intent and manufacturing requirements:

📏

Dimensional Information

Linear dimensions, angular dimensions, radii, diameters—precise measurements required for manufacturing.

⊥

Geometric Tolerancing (GD&T)

Symbols like perpendicularity, flatness, position, concentricity that define allowable variation in geometry.

〰️

Surface Finish

Roughness specifications (Ra, Rz values) indicating required surface quality for functional surfaces.

🔩

Material Specifications

Material callouts (e.g., "Steel AISI 316," "Aluminum 6061-T6") defining what material to use.

📝

Manufacturing Notes

Text annotations like "HEAT TREAT," "ANODIZE," "BREAK SHARP EDGES" providing process instructions.

📋

Standards Compliance

References to DIN, ISO, ANSI, ASME standards that govern interpretation of symbols and tolerances.

DIN Standards: DIN (Deutsches Institut für Normung - German Institute for Standardization) standards are widely used in Europe for technical drawing conventions. Our system was trained to recognize DIN symbology, dimensioning practices, and annotation standards.

The Solution: Three-Stage AI Pipeline

Stage 1: Azure ML Training Pipeline

Automated machine learning with Python SDK and MLflow

Data Preparation and Labeling

Building a high-quality training dataset was the foundation of the project:

✓Collected thousands of technical drawings from various sources (automotive, aerospace, machinery)
✓Ensured diversity: different drawing standards (DIN, ISO, ANSI), different industries, varying quality
✓Manual labeling: Expert annotators labeled PMI elements with bounding boxes and classifications
✓Label categories: Dimension lines, dimension text, GD&T symbols, surface finish symbols, leaders/arrows, text annotations, material callouts
✓Annotation format: COCO JSON format for object detection compatibility
✓Quality control: Multiple reviewers validated labels for accuracy

Image Preprocessing Pipeline

Technical drawings require specialized preprocessing to improve model performance:

Image Enhancement:

• Deskewing: Correct rotated scanned images
• Binarization: Convert to black/white for clarity
• Noise removal: Eliminate scanning artifacts
• Contrast enhancement: Improve symbol visibility
• Line thinning: Normalize line widths

Data Augmentation:

• Random rotation (±15°)
• Random scaling (0.8x - 1.2x)
• Brightness/contrast variation
• Random cropping for different zoom levels
• Gaussian blur to simulate scan quality

Azure ML Pipeline with Python SDK

Programmatic training pipeline built with Azure ML Python SDK for reproducibility and automation:

Pipeline Components:

• Data Ingestion: Load labeled images from Azure Blob Storage
• Preprocessing Step: Execute image enhancement transformations
• Training Step: Train object detection model (Faster R-CNN / YOLOv5)
• Evaluation Step: Calculate metrics (mAP, precision, recall) on validation set
• Model Registration: Register successful models in Azure ML Model Registry

MLflow Integration:

MLflow provides experiment tracking, model versioning, and reproducibility:

• Experiment Tracking: Log hyperparameters, metrics, and artifacts for each training run
• Model Versioning: Track model lineage and compare versions
• Model Registry: Centralized model store with stage transitions (Staging → Production)
• Artifact Logging: Store trained weights, confusion matrices, sample predictions
• Reproducibility: Record exact environment, data version, and code version

Model Architecture and Training

State-of-the-art object detection model optimized for PMI symbol recognition:

Model Choice:

Faster R-CNN with ResNet-50 backbone was selected for accuracy, though YOLOv5 was also tested for speed.

• Two-stage detector: Region proposals + classification
• High accuracy for small symbols (GD&T often <20 pixels)
• Pre-trained on COCO, fine-tuned on technical drawings

Training Configuration:

• Compute: Azure ML GPU clusters (NC6s_v3 with NVIDIA V100)
• Training time: 12-24 hours depending on dataset size
• Optimizer: SGD with momentum (0.9) and learning rate scheduling
• Loss function: Multi-task loss (classification + bounding box regression)
• Early stopping: Monitor validation mAP, stop if no improvement for 10 epochs

Achieved Result: ~80% accuracy (mAP@0.5) in detecting and classifying PMI symbols and labels on validation set. While not perfect, this provided a strong foundation for the post-processing stage.

Stage 2: EKS-Based Intelligent Post-Processing

Refining AI outputs to near-perfect accuracy

The 20% Gap Problem

While 80% AI accuracy was impressive, the remaining 20% included critical issues:

•Misread dimension values (e.g., "8" read as "3")
•Missed or incorrectly classified GD&T symbols
•Incomplete text extraction from annotations
•Confusion between similar symbols (diameter vs. radius)
•Errors in associating dimensions with correct geometry

Solution: Deploy sophisticated post-processing pipelines on Amazon EKS (Elastic Kubernetes Service) to clean, validate, and correct AI outputs.

Regex-Based Text Extraction and Validation

Regular expressions (regex) are powerful for parsing structured text in technical drawings:

Dimension Value Validation:

Dimensions follow predictable patterns. Regex validates and corrects OCR errors:

• Pattern: ^\d+\.?\d*\s?(mm|in|cm)?$ (number + optional unit)
• Tolerance format: 50±0.1 or 50^0.1_-0.05
• Reject invalid extractions (letters in numeric fields, etc.)

GD&T Symbol Standardization:

GD&T symbols have specific Unicode or ASCII representations:

• Regex patterns match common symbol variations
• Map to standardized symbol codes (e.g., ⊥ → "Perpendicularity")
• Validate symbol usage context (e.g., flatness requires datum reference)

Material Callout Extraction:

• Regex: (Steel|Aluminum|Brass)\s+[A-Z0-9\-]+
• Extract standardized material codes (AISI, SAE, DIN designations)
• Validate against known material database

Database Matching and Knowledge Base

A comprehensive database of known PMI elements improves accuracy through intelligent matching:

Symbol Database:

• Database contains all DIN, ISO, ANSI, ASME symbols with metadata
• Each symbol: visual representation, meaning, usage context, standard reference
• Fuzzy matching: If AI extracts ambiguous symbol, match to closest known symbol
• Confidence scoring: High confidence if close match, low if ambiguous

Context-Aware Validation:

PMI elements have logical relationships. Use these to detect and correct errors:

• If dimension arrow points to circle, likely a diameter (Ø) not radius (R)
• GD&T feature control frames must reference datums (A, B, C, etc.)
• Surface finish symbols must be near surface edges, not in open space
• Dimension values should be reasonable (flag 0.001mm hole diameter as error)

Cross-Reference with CAD Standards:

• Maintain database of common dimension formats per industry
• Automotive typically uses metric (mm), aerospace uses imperial (inches)
• Validate extracted dimensions against expected ranges for part type
• Flag unusual values for human review

EKS Deployment Architecture

Post-processing deployed as microservices on Amazon EKS for scalability and reliability:

Microservices:

• Text Extraction Service (OCR + Regex)
• Symbol Matching Service (Database lookup)
• Validation Service (Context-aware checks)
• Correction Service (Auto-fix common errors)
• Quality Assurance Service (Confidence scoring)

Infrastructure:

• Kubernetes orchestration on EKS
• Horizontal pod autoscaling for load
• Redis for caching symbol lookups
• PostgreSQL for symbol database
• Message queue (RabbitMQ) for async processing

Result: Post-processing pipeline improved accuracy from ~80% (AI alone) to near 100% for digitalized drawings. The remaining edge cases were flagged for human review with clear confidence scores.

Stage 3: Vendor SDK and API Integration

Seamless integration into existing CAD workflows

SDK Design Philosophy

CAD vendors needed an easy way to integrate PMI extraction into their existing software pipelines:

✓Simple API: RESTful API + Python/Java SDK wrappers
✓Minimal integration effort: Drop-in library with 5-10 lines of code
✓Flexible output formats: JSON, XML, or direct CAD format (STEP, IGES)
✓Batch processing: Handle multiple drawings in parallel
✓Real-time feedback: Progress updates and confidence scores
✓Human-in-the-loop: Flag low-confidence extractions for review

API Endpoints

POST /api/v1/extract

Upload technical drawing image, receive extracted PMI data.

Input: Image file (PDF, PNG, JPG, TIFF)
Output: JSON with PMI elements, bounding boxes, confidence scores

GET /api/v1/status/:job_id

Check processing status for async jobs.

Returns: Processing progress, estimated completion time

POST /api/v1/batch

Submit multiple drawings for batch processing.

Returns: Batch job ID, webhook for completion notification

Example SDK Usage (Python)

from pmi_extractor import PMIClient

# Initialize client
client = PMIClient(api_key="your_api_key")

# Extract PMI from drawing
result = client.extract_pmi(
    image_path="technical_drawing.pdf",
    standard="DIN",  # or "ISO", "ANSI"
    output_format="json"
)

# Access extracted data
for element in result.pmi_elements:
    print(f"Type: {element.type}")
    print(f"Value: {element.value}")
    print(f"Confidence: {element.confidence}")
    print(f"Position: {element.bbox}")

# Export to CAD format
result.export_to_step("output.step")

Vendor Integration Benefits

10x Faster Turnaround

Automated extraction reduces days of manual work to minutes of review.

Reduced Errors

AI consistency eliminates human transcription mistakes.

Cost Savings

Engineers focus on design, not data entry.

Scalability

Handle 100x more drawings with same team size.

Production Deployment: Azure DevOps & AKS

Enterprise-grade CI/CD with blue-green deployment

Blue-Green Deployment Strategy

Zero-downtime deployment ensuring continuous service availability:

How Blue-Green Works:

Blue Environment: Current production (serving live traffic)
Green Environment: New version deployed in parallel
Testing: Validate green environment with smoke tests
Traffic Switch: Azure Load Balancer redirects traffic to green
Monitoring: Watch metrics; rollback to blue if issues detected
Cleanup: Once stable, green becomes new blue

Helm Charts for Kubernetes Deployment

Helm provides templated, versioned Kubernetes deployments:

Helm Chart Structure:

• Deployment manifests for API service
• Service definitions for load balancing
• ConfigMaps for application configuration
• Secrets for API keys and credentials
• Ingress rules for external access
• HorizontalPodAutoscaler for scaling

Benefits:

• Version control: Track deployment history
• Rollback: Instantly revert to previous version
• Consistency: Same deployment across dev/staging/prod
• Parameterization: Environment-specific values

Azure DevOps CI/CD Pipeline

Automated pipeline from code commit to production deployment:

Build Stage

Run unit tests → Build Docker image → Push to Azure Container Registry

Test Stage

Deploy to test AKS cluster → Run integration tests → Validate API endpoints

Stage Stage

Deploy to staging (green) → Load testing → Manual approval gate

Prod Stage

Deploy to production (green) → Health checks → Traffic switch → Monitor

Complete Technology Stack

Machine Learning

Azure MLMLflowFaster R-CNNYOLOv5PyTorch

Cloud Infrastructure

AzureAWS EKSAKSAzure BlobContainer Registry

Post-Processing

RegexPostgreSQLRedisRabbitMQMicroservices

DevOps

Azure DevOpsHelmKubernetesDockerBlue-Green

Languages

PythonSQLYAMLBash

Libraries

OpenCVTesseract OCRPillowNumPyPandas

API/SDK

REST APIPython SDKJava SDKJSON/XMLWebhooks

Standards

DINISOANSIASMEGD&T

Project Impact & Achievements

80% → 100%

Accuracy Pipeline

AI + post-processing

10x

Faster

Than manual entry

Zero

Downtime

Blue-green deployment

Key Achievements

Built end-to-end PMI extraction pipeline from raw drawings to structured digital data

Achieved 80% AI detection accuracy with Azure ML, MLflow, and state-of-the-art object detection models

Improved accuracy to near 100% with intelligent regex-based validation and database matching

Deployed scalable post-processing microservices on Amazon EKS with auto-scaling

Created vendor-friendly SDK with simple API integration for seamless CAD workflow adoption

Implemented enterprise-grade CI/CD with Azure DevOps, Helm charts, and blue-green deployment on AKS

Reduced CAD vendor turnaround time from days to minutes, enabling 10x productivity gains

Supported multiple technical drawing standards (DIN, ISO, ANSI) for global applicability

This comprehensive AI-powered PMI extraction system revolutionizes technical drawing digitalization for CAD companies, eliminating the manual transcription bottleneck that delays manufacturing projects. By combining Azure ML training, intelligent EKS-based post-processing with regex and database matching, and a developer-friendly SDK, the system delivers near-perfect accuracy in extracting dimensions, tolerances, GD&T symbols, and manufacturing notes—enabling CAD vendors to process legacy drawings 10x faster and focus engineers on high-value design work rather than data entry.

Back to Home