Implementing Monocular Depth Estimation for Obstacle Avoidance: Following the TU Delft NanoDepth Approach

With the simulation environment set up and basic wall-following behaviour validated, the next critical component for autonomous obstacle avoidance is depth perception. Since the Crazyflie AI deck provides only a single monocular camera, traditional stereo vision approaches are out of the question. This presents the classic challenge of extracting 3D depth information from 2D images - a problem that has driven significant research in computer vision.

The Challenge of Monocular Depth Estimation

Estimating depth from a single camera is inherently an ill-posed problem. Unlike stereo vision systems that use disparity between two cameras to triangulate distance, monocular systems must rely on visual cues like object size, perspective, occlusion, and texture gradients. For a resource-constrained platform like the Crazyflie, this becomes even more challenging due to:

Limited computational power of the GAP8 processor (8+1 core RISC-V, 512KB L2 RAM)
Low resolution camera (324x324 pixels)
Real-time processing requirements for obstacle avoidance
Strict payload and power consumption constraints

Recent research has shown that traditional computer vision approaches to depth estimation, while conceptually sound, face significant computational challenges on nano-UAV platforms. Even lightweight models like FastDepth and PyD-Net exceeded the Crazyflie's computing power and storage capacity, highlighting the need for purpose-built solutions.

Discovering the TU Delft NanoDepth Solution

Rather than reinventing the wheel, I found that researchers at TU Delft had already tackled this exact problem in their paper "Nano Quadcopter Obstacle Avoidance with a Lightweight Monocular Depth Network" (IFAC 2023). This work represents the first successful implementation of a depth estimation CNN deployed directly on Crazyflie hardware

Key innovations include:

A lightweight CNN with only 391,793 trainable parameters (among the smallest in literature)
Self-supervised learning combined with knowledge distillation from larger networks
Dense depth map prediction optimized for the GAP8's computational constraints
Integration with a sophisticated behaviour state machine for robust control

The TU Delft team open-sourced their implementation in the https://github.com/tudelft/depth_avoider_crazyflie repository, providing the complete NanoDepth.py CNN framework written in PyTorch.

In the next post, I'll show the actual depth maps generated from my simulation environment