Beyond Text-to-Image: ‘Visual Engine’ is a New Architecture for Screen Augmentation Built with NVIDIA NIM

BRELYON
6 min readDec 19, 2024

--

Evolving generative AI to give real-time consciousness to video streams.

What if pixels had a mind of their own, and you could interact with your display such that it understood your visual experience and enhanced it in real time in ways you weren’t even aware of? Brelyon has been working at the intersection of display tech and deep learning to understand this possibility. It’s a new leap into “generative displays” and builds on the long and rich history of AI.

The Past and the Present

Frank Rosenblatt’s 1950s Mark I Perceptron, the world’s first neural network image classifier, laid the groundwork for today’s deep learning menagerie. AI is in business models, education platforms, healthcare diagnostics, autonomous vehicles, content creation, entertainment, widgets, gadgets, and edge computations. GANs, VAEs, CNNs, LSTMs…the list goes on and on, and apparently any inference can be made with endless training data. Each instance offers its own benefits, but friction and opportunity cost can be formidable, and improvements are sometimes marginal. How does one extract the signal from the noise to identify really valuable innovation?

Rosenblatt’s newest descendant is a brand new display technology that uses deep learning as platform for intuitive, customized interactivity. Brelyon, the pioneer of immersive display technologies, today announced a proprietary rendering architecture called Visual Engine, built with NVIDIA NIM microservices, to help users more effectively and quickly re-render or augment screen interfaces in real time. It achieves faster inferencing and provides the user with real-time updates when generating interactive interfaces, all computed at the shader level.

Visual Engine is a universal software tool that understands your video stream, recognizes what’s going on, and annotates it in new and useful ways (see below). A video call that adds new functions or buttons in the white space. A simulator program that assesses your actions and provides real-time feedback. A multiviewer analyzer to help you focus on a pressing task among multiple high priorities. An on-the-fly generative graphical interface for familiarizing yourself with new and old machinery. Visual Engine provides you with more options, features, and tools, even some that you maybe weren’t aware of. All of this points to new opportunities in immersive entertainment, productivity, and upskilling. So, how does it work?

Real-time annotations computed by Visual Engine, giving augmented-reality life to your existing video stream. (Note: content is being shown on Brelyon Ultra Reality Extend, which displays content in depth.)

Brelyon Visual Engine: From Photos to Photons

First, the importance of visual content can’t be underestimated. It’s the primary way we interact with the world. In his 1893 lecture before the National Electric Light Association, Nikola Tesla comments on vision and the human eye:

It is the most precious, the most indispensable of our perceptive or directive organs, it is the great gateway through which all knowledge enters the mind. Of all our organs, it is the one, which is in the most intimate relation with that which we call intellect. So intimate is this relation, that it is often said, the very soul shows itself in the eye.

In support of this claim, and in providing a compelling visual experience, photorealism is front and center. GPUs have done absolutely Herculean work here, but we are starting to see marginal returns and are seeking new innovation.

Visual Engine takes the next leap, from photorealism to photon-realism, adding an entirely new layer of image-based programming. It’s a reinvention of the GPU-display communication layer — and uses GPU compute for content-aware visual effect augmentation: Visual Engine converts any display into an AI-powered augmented reality experience. It increases visual bandwidth productively and dynamically, and it allows frictionless integration into all activities: enterprise and productivity (streamlining tasks, suggesting efficient actions, summarizing complex data), gaming and entertainment (enhancing immersion with live and adaptable graphics, making gaming recommendations), and content creation (automatically annotating live streams, providing on-the-fly comments).

Concept of Visual Engine: it recognizes the graphical content of your video stream, automatically generates content using powerful deep learning methods, and pushes that into to you view to assist you with your tasks.

In other words, instead of just producing content based on crafty prompt engineering, it fluidly observes your input stream and auto-generates new features in real time as a rendered layer of visual content. It’s an evolution of multimodal generative content, from text-to-image and image-to-video, into video stream-to-visual environment. Hear the story:

For example, Visual Engine is a training ground for upskilling to enhance your existing skill set and meet changing job demands or technical requirements. This customization better mimics one-on-one coaching and offers a real-time, adaptive learning environment tailored to your objectives and skill gaps. And when those skills are needed for mission-critical visualizations, it improves and quickens your decision making. Visual Engine transforms any standard, passive screen into an active one that understands what you’re trying to accomplish and harmoniously assists you.

Technically, Visual Engine operates like an intelligent graphics shader. (A “shader” is a computer graphics program that processes pixels in a 3D scene for photorealism.) In many ways, it’s a major generalization and advancement over contextual writing assistants that help you write an email. The key difference is that Visual Engine recognizes what’s on your screen and how you’re interacting with it’s content, and then it adds graphical content based on that context. The rendering pipeline below shows the three main elements of Visual Engine: the vision engine to see the input stream, the inference engine to understand the context, and the generative engine to produce new content for the resulting output stream.

Rendering pipeline for Visual Engine
ARGEN Heart (A.H.) uses Visual Engine for generative depth content in real time.

Now, keep in mind that Visual Engine can run on any monitor, but it’s also worth emphasizing that Visual Engine’s generating a content layer is enhanced even further with Brelyon Ultra Reality Extend, which is a true-depth multifocal display system. Visual Engine and Ultra Reality Extend mutually reinforce other to enhance the visual experience, albeit in different ways (software and hardware). Ultra Reality Extend’s ARGEN Heart is the computational module that uses Visual Engine for dynamic depth modulation and immersive visualizations.

Brelyon Ultra Reality Extend uses ARGEN Heart to generate new content based on your input video stream.
Multiple use cases for Brelyon Ultra Reality Extend + ARGEN Heart.

Imagine the possibilities, ranging from productivity enhancement (above), to immersive entertainment, where content-enhancing effects are automatically generated to bring more life and depth to your experience. ARGEN Heart uses a real-time depth decision model that lets you play any PC games or video content with one extra computational bit of depth per frame. When objects or characters are close, they are rendered on the near layer, and when they’re on the horizon, they’re shown in the background. See below for a graphical demonstration of ARGEN Heart’s real-time depth rendering…Imagine seeing Spider-Man jumping toward you as if you were really there!

Visual Engine’s real-time, per-frame depth discrimination for true-depth effects. This lets you convert your videos and games into true-depth immersive experiences.
Generative effects of your existing content are rendered with ARGEN Heart. Here, ARGEN Heart identifies the background animation’s environment and auto-generates particle effects to enhance immersion.

The Future

Tesla later mused in his lecture about the brain-eye-light connection: “at times, when a sudden idea or image presents itself to the intellect, there is a distinct and sometimes painful sensation of luminosity produced in the eye, observable even in broad daylight.” That is, he argues that visual content is imprinted onto the mind, and the mind creates visual content. At Brelyon studies this connection between light, ideas, and vision, and uses generative techniques like Visual Engine to build interfaces for enhancing the human-computer experience. In this way, technological tools can be used, not just as convenient gadgets, but to help us flourish and reach our goals.

A new generation of deep learning and computer vision: generative displays transform visual content into a brand new, interactive user experience.

--

--

BRELYON
BRELYON

Written by BRELYON

We are a team of scientists, and entrepreneurs focused on future of human computer evolution. Our expertise is in display tech. and computer science.

No responses yet