Redefining Hybrid Meetings With AI-powered 360° Videoconferencing

The global pandemic catalyzed a boom in videoconferencing that continues to grow as companies embrace hybrid work models and seek more sustainable approaches to business communication with less travel. Now, with videoconferencing becoming a cornerstone of modern business practices, the task of systems developers is to improve the user experience while delivering a higher level of functionality and performance. Moving beyond standard webcams, the need for innovative business communications solutions is driving demand for technologies like 360° videoconferencing cameras that create immersive hybrid meeting experiences.

The latest 360° cameras provide a panoramic view of conference rooms, capturing all in-person attendees. Viewers are also able to digitally pan, tilt, and zoom around the room as if they were actually present. This gives remote participants an immersive experience, facilitating natural collaboration by creating organic, face-to-face interactions.

Utilizing the power of AI, developers can create the next generation of videoconferencing systems with enhanced capabilities and minimized hardware requirements. Let’s explore one such innovative videoconferencing implementation, where four 4K cameras are connected to a single, high-performance AI vision processor. This creates a system that offers a wide-panorama, 360° view, as well as views of individual conference participants, alongside functions such as participant tracking and automated stitching. Additionally, these panoramic views can be combined with high-resolution, 360° dewarping to provide a cleaner, undistorted view of the entire room.

At the heart of this next-generation system is the CVflow® advanced AI engine inside of Ambarella’s systems-on-chip (SoCs), which is designed for high efficiency, high performance, and low latency applications. With just a single chip, the CVflow engine empowers developers to implement a suite of AI-powered features that can run concurrently on multiple regions of interest, including:

  • Face Recognition (Face ID): enables multi-participant auto-framing, tracking, and re-identification across different cameras
  • Background Removal: offers clean, professional-looking video feeds
  • Hand Gesture Detection and Classification: enables presenters to control the camera with simple hand motions
  • Whiteboard Content Extraction: enhances collaboration by recognizing whiteboard content; optimizing its appearance and improving its legibility
  • Vivid HDR (AI-assisted tone mapping): improves image quality over traditional processing by using AI to provide a wider dynamic range for participants who are in starkly contrasting lighting conditions within the same room (e.g., one is close to the window, and another is in a corner with low light)

Alongside vision-related features, the CVflow AI engine inside our SoCs also supports the implementation of AI audio features such as:

  • Voice ID: recognizes and targets individual voices, combined with Face ID; eliminating non-target voices for clearer audio
  • AI-based Noise Classification and Suppression: identifies unwanted sounds—such as coughing, barking, or lawnmowers—in real time and suppresses them, eliminating distracting noises

Going beyond these AI features, an implementation utilizing an Ambarella CVflow SoC offers several key technical advantages:

  • Industry-leading image signal processing performance for multi- and single-camera solutions
  • 8K video encoding to support multiple regions of interest with high resolution—zooming in on participants located farther from the camera without video quality loss
  • Dynamic stitching to minimize artifacts when combining images from multiple cameras
  • Ultra-low latency (80ms) from camera to display—well below the Microsoft Teams specification

The comprehensive set of features and technical advantages in the above example significantly enhances the videoconferencing experience, which results in a more cost-effective and energy-efficient solution compared to previous FPGA-based systems. In addition, utilizing a single SoC with the integrated CVflow AI engine, along with Ambarella’s tools, provides a robust development environment capable of supporting a wide range of algorithms, and offers a higher degree of design flexibility and simplicity without compromising on functionality or performance.

As we look to the future, the integration of AI in videoconferencing systems will require more powerful and efficient processors capable of both supporting a rich set of AI-based features and delivering exceptional image quality. By harnessing the power of AI, we’re not just enhancing video calls, we’re reimagining the very nature of hybrid work and remote communication.