Autonomous Drone Inventory Counting System

🚁

Real-time Warehouse Management with Drone Swarms and Computer Vision

Project Introduction

Large-scale storage warehouses face a critical challenge in the modern e-commerce era: maintaining real-time inventory accuracy. With the explosive growth of e-commerce and services like Amazon FBA (Fulfillment by Amazon) requiring small-scale vendors to outsource storage and logistics, warehouse inventory management has become increasingly complex and demanding.

The Problem: Traditional inventory management relies on human workers—often forklift drivers—to manually count and update inventory as items move in and out of the warehouse. This approach has several critical flaws:

•Human errors are extremely common, leading to inventory discrepancies
•High frequency of items moving in/out makes real-time updates nearly impossible
•Shortage of human resources: forklift drivers are expensive and in short supply
•Manual counting is slow, creating bottlenecks in warehouse operations
•Safety concerns: humans working at heights or in busy warehouse environments

The solution: Deploy a swarm of autonomous drones that fly through the warehouse environment, automatically mapping the space, identifying inventory locations, counting items in real-time using computer vision, and updating the inventory management system via API—all without human intervention.

Swarm

Multi-Drone System

SLAM

Rtabmap Navigation

Point Cloud Analysis

YOLO

Object Detection

System Overview

This project delivers a complete end-to-end autonomous drone inventory system encompassing robotics, computer vision, deep learning, real-time control systems, and MLOps infrastructure:

🗺️

Phase 1: Warehouse Mapping

Autonomous navigation and SLAM-based 3D mapping of warehouse environment using ROS2, Nav2, Rtabmap, and Octomap to create detailed spatial understanding.

🔍

Phase 2: Feature Identification

3D point cloud segmentation using Open3D and DBSCAN to identify shelving racks, zones, columns, and spatial features. Automated labeling system for inventory locations.

📦

Phase 3: Object Detection & Counting

Live video streaming with custom YOLO models for real-time object detection. Intelligent counting algorithms for stacked cartons, pallets, tools, motors, and drums.

🖥️

Phase 4: System Integration

Next.js control panel, live location tracking, drone fleet management, API integration for inventory updates, and MLOps/GitOps pipeline for continuous model improvements.

Phase 1: Warehouse Mapping with SLAM

ROS2 and Nav2 Navigation Stack

Modern robotics framework for autonomous navigation

Why ROS2?

ROS2 (Robot Operating System 2) was chosen over ROS1 for several critical advantages in production environments:

Real-time Performance

DDS middleware provides deterministic real-time communication essential for drone control

Multi-Robot Support

Native support for multiple drones (swarm) without complex workarounds

Security & Production-Ready

Built-in security features and enterprise-grade reliability

Nav2 Navigation Stack

Nav2 is the next-generation navigation framework for ROS2, providing sophisticated autonomous navigation capabilities:

✓Behavior Trees: Flexible mission planning and execution
✓Multiple navigation algorithms: DWB, TEB, Regulated Pure Pursuit controllers
✓Recovery behaviors: Automatic handling of stuck situations
✓Waypoint following: Sequential navigation through inventory locations
✓Dynamic obstacle avoidance: Real-time path replanning
✓Costmap layers: Static map, inflation, obstacle, and voxel layers

SLAM with Rtabmap (RGB-D Graph-Based SLAM)

Real-Time Appearance-Based Mapping for 3D environment reconstruction

What is Rtabmap and How Does It Work?

Rtabmap (Real-Time Appearance-Based Mapping) is a RGB-D Graph-Based SLAM algorithm designed for large-scale and long-term online operation. It's particularly well-suited for warehouse environments with repetitive structures.

Core Concept: Graph-Based SLAM

Rtabmap represents the environment as a graph where:

• Nodes: Keyframes containing RGB images, depth images, and camera poses
• Edges: Spatial constraints between nodes (odometry, loop closures)
• Goal: Optimize the graph to find the most consistent map and trajectory

Rtabmap Pipeline:

1. Sensor Input (RGB-D Camera: Intel RealSense)
   ↓
2. Feature Extraction (SURF/ORB/SIFT keypoints from RGB)
   ↓
3. Odometry Estimation (Visual Odometry + IMU fusion)
   ↓
4. Loop Closure Detection (Bag-of-Words for place recognition)
   ↓
5. Graph Optimization (g2o/GTSAM backend)
   ↓
6. 3D Point Cloud Generation (from depth + optimized poses)
   ↓
7. Occupancy Grid / Octomap Output

Key Rtabmap Features Used:

Loop Closure Detection

Recognizes previously visited locations to correct drift and improve map consistency

Memory Management

Transfers old data to long-term memory to maintain real-time performance in large warehouses

Multi-Session Mapping

Can resume mapping from previous sessions, building incrementally over time

RGB-D + Lidar Fusion

Combines RealSense depth and RP Lidar 2D scans for robust 3D mapping

Odometry and Triangulation Methods

Accurate odometry is crucial for drone localization during mapping:

Visual Odometry (VO):

• Tracks feature points across consecutive RGB-D frames
• Uses triangulation to estimate 3D coordinates of features
• Calculates camera motion (rotation + translation) via PnP (Perspective-n-Point)
• Rtabmap uses F2M (Frame-to-Map) and F2F (Frame-to-Frame) VO

Triangulation Process:

Triangulation estimates 3D point positions from 2D image observations:

• Detect same feature in two or more camera views (stereo/temporal)
• Knowing camera poses and intrinsic parameters, project rays from cameras through feature pixels
• 3D point is where rays intersect (with noise, use least-squares optimization)
• Depth from RealSense validates and improves triangulation accuracy

Sensor Fusion: The system fuses Visual Odometry, IMU (Inertial Measurement Unit), and flight controller data using an Extended Kalman Filter (EKF) to provide robust pose estimation even when visual features are temporarily lost.

Octomap: 3D Occupancy Mapping

Efficient 3D representation for navigation and obstacle avoidance

What is Octomap and How Does It Work?

Octomap is a probabilistic 3D occupancy mapping framework based on octrees—a hierarchical tree data structure that efficiently represents 3D space by recursively subdividing it into octants (8 child nodes per parent).

Octree Data Structure:

• Root Node: Represents entire mapped space
• Subdivision: Each node can be divided into 8 children (octants)
• Leaf Nodes: Store occupancy probability (free, occupied, unknown)
• Pruning: Homogeneous regions collapsed to single nodes, saving memory
• Resolution: Configurable voxel size (e.g., 10cm cubes)

Occupancy Probability Updates:

Octomap uses a Bayesian update scheme to handle sensor uncertainty:

• Each voxel has a log-odds occupancy value
• When a sensor (Lidar/Depth camera) observes a voxel as occupied, increase probability
• When a ray passes through a voxel (free space), decrease probability
• Multiple observations integrated over time → confident occupancy map
• Handles dynamic environments by allowing probability updates

Memory Efficiency

Octree compression reduces memory usage by 10-100x compared to dense 3D grids

Fast Queries

O(log n) lookup time for collision checking and path planning

Integration: Rtabmap generates point clouds → Octomap converts them into 3D occupancy grid → Nav2 uses Octomap for collision avoidance and path planning in 3D warehouse space.

Phase 2: 3D Point Cloud Segmentation & Feature Identification

Point Cloud Processing with Open3D

Extracting meaningful structure from 3D warehouse scans

From Point Cloud to Warehouse Structure

After mapping, we have a massive 3D point cloud representing the entire warehouse. The challenge: automatically identify shelving racks, zones, aisles, and inventory locations.

Open3D Processing Pipeline:

▸Point cloud downsampling with voxel grid filter (reduce density while preserving structure)
▸Statistical outlier removal to clean noise from sensor data
▸Normal estimation for each point (surface orientation)
▸Plane segmentation using RANSAC to identify floors, walls, shelves
▸Clustering to group points into distinct objects/structures

DBSCAN Clustering for Feature Identification

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm perfect for identifying shelving structures in point clouds.

Why DBSCAN for Warehouses?

• No predefined cluster count: Automatically discovers number of shelves/racks
• Handles arbitrary shapes: Shelves can be L-shaped, curved, or irregular
• Noise robust: Ignores scattered points (forklift, pallets, people)
• Density-based: Groups dense point regions (shelves) while rejecting sparse areas (aisles)

DBSCAN Process:

Define ε (epsilon): neighborhood radius around each point
Define MinPts: minimum points to form dense region
Classify points: Core (≥MinPts neighbors), Border, Noise
Connect core points within ε distance → forms clusters
Each cluster = one rack/shelf structure

Result: Each identified cluster represents a distinct warehouse feature (rack, pallet position, forklift area, etc.). Extracted cluster centroids and bounding boxes provide x,y,z coordinates for navigation.

Automated Labeling System

Once features are identified, the system automatically assigns hierarchical labels:

Spatial Hierarchy:

• Zones: Large areas (Zone 1, Zone 2, Zone 3, etc.)
• Aisles: Pathways between racks (Aisle A, B, C...)
• Racks: Individual shelving units (Rack R1, R2, R3...)
• Columns: Vertical divisions (Column 1, 2, 3...)
• Shelves: Height levels (Shelf A, B, C, D...)

Location Format:

Example: Zone2-Aisle-B-R5-Col3-ShelfC

This location code uniquely identifies every inventory position in the warehouse, stored with (x, y, z) coordinates for drone navigation.

Phase 3: Live Object Detection & Intelligent Counting

Custom YOLO Object Detection

Real-time inventory item detection with live video streaming

Live Video Streaming Architecture

As drones navigate through the warehouse, they stream live video to a ground station for real-time object detection:

Video Pipeline:

• Intel RealSense RGB stream (1920x1080 @ 30fps)
• H.264 video encoding on drone
• Low-latency streaming (GStreamer/WebRTC)
• GPU-accelerated decoding on ground station
• Frame buffer for YOLO inference

Custom YOLO Training:

• YOLOv5/v8 architecture optimized for warehouse items
• Training dataset: cartons, pallets, tools, motors, drums, etc.
• Data augmentation for varying lighting/angles
• TensorRT optimization for real-time inference
• Class confidence thresholding (>0.6)

Intelligent Counting Mechanisms

Different inventory items require different counting strategies. The system adapts based on object type:

Scenario 1: Stacked Cartons on Pallets

Most challenging due to occlusion and stacking patterns.

• YOLO detects visible cartons in current frame
• Multiple viewing angles: drone circles pallet
• 3D spatial tracking: associate detections across frames using depth + pose
• Counting algorithm: track unique carton positions in 3D space
• Stacking estimation: measure pallet height with depth camera, estimate layers
• Confidence scoring: multiple observations → higher confidence

Scenario 2: Individual Items (Tools, Motors, Drums)

Simpler counting for discrete, non-stacked items.

• YOLO directly detects and counts individual items
• Bounding box filtering to avoid double-counting
• Non-maximum suppression (NMS) for overlapping detections
• Simple summation across frames with deduplication
• Fast and reliable for clearly visible objects

Scenario 3: Partial Occlusion Handling

When items are partially hidden behind others.

• Instance segmentation (YOLOv8 segmentation) for precise boundaries
• Depth-based occlusion reasoning: closer items occlude farther ones
• Multi-view fusion: combine counts from different angles
• Probabilistic counting: assign confidence to partially visible items

Real-time Inventory Updates via API

As drones count items, the system immediately updates the inventory management system:

✓RESTful API integration with warehouse management system (WMS)
✓Structured data payload: location_code, item_type, quantity, confidence, timestamp
✓Batch updates to minimize API calls (aggregate multiple counts)
✓Conflict resolution: if human entered count differs, flag for verification
✓Audit trail: all counts logged with drone ID and video frame reference
✓Real-time dashboard updates visible to warehouse managers

Phase 4: Control Panel & MLOps Infrastructure

Next.js Drone Control Panel

Real-time fleet management and monitoring dashboard

Control Panel Features

Live Location Tracking

Real-time 3D visualization of all drone positions overlaid on warehouse map. Shows current mission, battery level, and status.

Mission Planning

Drag-and-drop waypoint editor for creating inventory counting routes. Optimize coverage paths automatically.

Fleet Management

Monitor battery levels, flight hours, maintenance schedules. Assign missions to available drones intelligently.

Video Streaming

Live video feed from selected drone with real-time YOLO detection overlays showing bounding boxes and counts.

Inventory Dashboard

Real-time inventory counts updated as drones complete missions. Historical trends and discrepancy alerts.

Alert System

Notifications for low battery, obstacle detection failures, counting discrepancies, or mission completion.

Technology Stack

Next.js 14

React

TypeScript

Tailwind CSS

WebSocket

Three.js

Chart.js

Zustand

MLOps & GitOps Pipeline

Continuous improvement of object detection models

Continuous Model Improvement

As drones collect more data, the YOLO models continuously improve through an automated MLOps pipeline:

Data Collection Loop

• Drones capture images of inventory items during missions
• Low-confidence detections flagged for human review
• Warehouse staff label uncertain images via web interface
• New labeled data added to training dataset automatically

Automated Training Pipeline

• GitLab CI/CD triggers training when dataset reaches threshold
• GPU cluster trains new YOLO model version
• Automated validation on held-out test set
• If accuracy improves, promote to staging environment
• A/B testing: compare new model vs. old on live drone fleet
• If metrics improve, blue-green deployment to production

GitOps Deployment

• Model artifacts stored in Git LFS (Large File Storage)
• Kubernetes deployment manifests define model serving config
• ArgoCD/Flux monitors Git repo for changes
• Automated rollout to drone fleet with health checks
• Rollback capability if new model degrades performance

Hardware Components

Intel RealSense Depth Camera

RGB-D camera providing synchronized color and depth streams. Used for visual odometry, 3D mapping, and object detection. D435/D455 models with up to 90 FPS.

1920x1080 RGBUp to 10m depth rangeStereo visionIMU integrated

RP Lidar 2D Laser Scanner

360° laser rangefinder for obstacle detection and 2D mapping. Provides long-range obstacle detection beyond RealSense depth range. Essential for safe navigation.

360° scanning12m range8000 samples/secLow cost, reliable

High-End Flight Controller

Pixhawk 4 or similar flight controller running PX4/ArduPilot firmware. Handles low-level flight stabilization, receives waypoints from ROS2 Nav2.

IMU + GPSObstacle avoidanceFailsafe modesMAVLink protocol

Onboard Computer

NVIDIA Jetson Xavier NX or similar edge AI computer. Runs ROS2, Rtabmap, YOLO inference, video streaming. Low power consumption for extended flight time.

GPU accelerationROS2 compatibleLow powerReal-time inference

Dynamic Obstacle Avoidance

The system handles dynamic warehouse environments with moving forklifts, people, and changing inventory:

• Costmap layers: Static map (walls/racks) + dynamic obstacles (Lidar/depth) + inflation layer (safety margin)
• Real-time replanning: Nav2 DWB controller replans path every 100ms if obstacles detected
• Sensor fusion: Combines Lidar (long-range, 2D) and RealSense (short-range, 3D) for comprehensive awareness
• Emergency stop: If obstacle too close, hover in place until path clears

Complete Technology Stack

Robotics Framework

ROS2 HumbleNav2RtabmapOctomapMAVROS

SLAM & Mapping

Visual SLAMLidar SLAMGraph-BasedLoop ClosureSensor Fusion

Computer Vision

YOLOv5/v8Open3DOpenCVInstance SegmentationTensorRT

Point Cloud

DBSCANRANSACPCLVoxel GridNormal Estimation

Frontend

Next.js 14ReactTypeScriptTailwind CSSThree.js

MLOps/DevOps

GitLab CIDockerKubernetesArgoCDGit LFS

Hardware

Intel RealSenseRP LidarPixhawkJetson XavierGPS/IMU

Languages

PythonC++TypeScriptBashYAML

Project Impact & Achievements

Autonomous

Full Autonomy

Zero human intervention

Real-time

Live Updates

Instant inventory sync

Complete System

Mapping to counting

Key Achievements

Built complete autonomous drone system from hardware integration to cloud deployment

Implemented SLAM-based warehouse mapping using Rtabmap, Octomap, and sensor fusion

Developed intelligent point cloud segmentation with Open3D and DBSCAN for automatic feature labeling

Created custom YOLO models with intelligent counting algorithms for diverse inventory types

Built Next.js control panel with real-time fleet management and live video streaming

Established MLOps/GitOps pipeline for continuous model improvement and automated deployment

Achieved real-time inventory accuracy eliminating human errors and resource shortages

Enabled safe dynamic obstacle avoidance with multi-sensor fusion and Nav2 navigation

This comprehensive autonomous drone inventory system addresses the critical challenge facing modern warehouses: maintaining real-time inventory accuracy in high-velocity e-commerce environments. By combining ROS2 robotics framework, SLAM-based 3D mapping, advanced computer vision with YOLO, intelligent point cloud processing, and MLOps infrastructure, the system delivers fully autonomous inventory counting that eliminates human error, reduces labor costs, and provides instant inventory visibility—revolutionizing warehouse operations for the Amazon FBA era.

Back to Home