Streamline Development of AI-Powered Apps with NVIDIA RTX AI Toolkit for Windows RTX PCs

NVIDIA today launched the NVIDIA RTX AI Toolkit, a collection of tools and SDKs for Windows application developers to customize, optimize, and deploy AI models for Windows applications. It’s free to use, doesn’t require prior experience with AI frameworks and development tools, and delivers the best AI performance for both local and cloud deployments.

The wide availability of generative pretrained transformer (GPT) models creates big opportunities for Windows developers to integrate AI capabilities into apps. Yet shipping these features can still involve significant challenges. First, you need to customize the models to meet the specific needs of the application. Second, you need to optimize the models to fit on a wide range of hardware while still delivering the best performance. And third, you need an easy deployment path that works for both cloud and local AI.

The NVIDIA RTX AI Toolkit provides an end-to-end workflow for Windows app developers. You can leverage pretrained models from Hugging Face, customize them with popular fine-tuning techniques to meet application-specific requirements, and quantize them to fit on consumer PCs. You can then optimize them for the best performance across the full range of NVIDIA GeForce RTX GPUs, as well as NVIDIA GPUs in the cloud.

When it comes time to deploy, the RTX AI Toolkit enables several paths to match the needs of your applications, whether you choose to bundle optimized models with the application, download them at app install/update time, or stand up a cloud microservice. The toolkit also includes the NVIDIA AI Inference Manager (AIM) SDK that enables an app to run AI locally or in the cloud, depending on the user’s system configuration or even the current workload.

Powerful, customized AI for every application

Today’s generative models are trained on huge datasets. This can require several weeks, using hundreds of the world’s most powerful GPUs. While these computing resources are out of reach for most developers, open-source pretrained models give you access to powerful AI capabilities.

Pretrained foundation models, available as open source, are typically trained on generalized data sets. This enables them to deliver decent results across a wide range of tasks. However, applications often need specialized behavior. A game character needs to speak in a particular way, or a scientific writing assistant needs to understand industry-specific terms, for example.

Optimized for PCs and the cloud

Optimizing AI models involves two main challenges. First, PCs have limited memory and compute resources for running AI models. Second, between PC and cloud, there’s a wide range of target hardware with different capabilities.

RTX AI Toolkit includes the following tools for optimizing AI models and preparing them for deployment.

NVIDIA TensorRT Model Optimizer: Even smaller LLMs can require 14 GB or more RAM. NVIDIA TensorRT Model Optimizer for Windows, with general availability starting today, provides tools to quantize models to be up to 3x smaller without a significant reduction in accuracy. It includes methods such as INT4 AWQ post-training quantization to facilitate running state-of-the-art LLMs on RTX GPUs. With this, not only can smaller models more easily fit in the GPU memory available on typical systems, it also improves performance by reducing memory bandwidth bottlenecks.

Develop once, deploy anywhere

By giving your applications the ability to perform inference locally or in the cloud, you can deliver the best experience to the most users. Models deployed on-device can achieve lower latency and don’t require calls to the cloud at runtime, but have certain hardware requirements. Models deployed to the cloud can support an application running on any hardware, but have an ongoing operating cost for the service provider. Once the model is developed, you can deploy it anywhere with the RTX AI Toolkit, and it’s tools for both on-device and cloud paths with:

NVIDIA AI Inference Manager (AIM): AIM, available as early access, simplifies the complexity of AI integration for PC developers, and orchestrates AI inference seamlessly across PC and cloud. NVIDIA AIM pre-configures the PC environment with the necessary AI models, engines, and dependencies, and supports all major inference backends (TensorRT, ONNX Runtime, GGUF, Pytorch) across different accelerators including GPU, NPU and CPU. It also performs runtime compatibility checks to determine if the PC can run the model locally, or switch over to the cloud based on developer policies.

With NVIDIA AIM, developers can leverage NVIDIA NIM to deploy in the cloud, and tools like TensorRT for local on-device deployment.

NVIDIA NIM: NVIDIA NIM is a set of easy-to-use microservices designed to accelerate the deployment of generative AI models across the cloud, data center, and workstations. NIM is available as part of the NVIDIA AI Enterprise suite of software. RTX AI Toolkit provides the tools to package an optimized model with its dependencies, upload to a staging server, and then launch a NIM. This will pull in the optimized model and create an endpoint for applications to call.

Models can also be deployed on-device using the NVIDIA AI Inference Manager (AIM) plugin. This helps to manage the details of local and cloud inference, reducing the integration load for the developer.

NVIDIA TensorRT: NVIDIA TensorRT 10.0 and TensorRT-LLM inference backends offer best-in-class performance for NVIDIA GPUs with tensor cores. The newly released TensorRT 10.0 simplifies deployment of AI models into Windows applications. Weight-stripped engines enable compression of more than 99% of the compiled engine size, such that it can be refitted with model weights directly on end-user devices. Moreover, TensorRT offers software and hardware forward compatibility for AI models to work with newer runtimes or hardware. TensorRT-LLM includes dedicated optimizations for accelerating generative AI LLMs and SLMs on RTX GPUs, further accelerating LLM inferencing.

These tools enable developers to prepare ‌models that are ready at application runtime.

RTX AI acceleration ecosystem

Top creative ISVs, including Adobe, Blackmagic Design, and Topaz Labs, are integrating NVIDIA RTX AI Toolkit into their applications to deliver AI-accelerated apps that run on RTX PCs, enhancing the user experience for millions of creators.

To build accelerated RAG-based and agent-based workflows on RTX PCs, you can now access the capabilities and components of the RTX AI Toolkit (such as TensorRT-LLM) through developer frameworks like LangChain and LlamaIndex. In addition, popular ecosystem tools (such as Automatic1111, Comfy.UI, Jan.AI, OobaBooga, and Sanctum.AI) are now accelerated with the RTX AI Toolkit. Through these integrations, you can easily build optimized AI-accelerated apps, deploy them to on-device and cloud GPUs, and enable hybrid capabilities within the app to run inference across local and cloud environments.

Bringing powerful AI to Windows applications

The NVIDIA RTX AI Toolkit provides an end-to-end workflow for Windows application developers to leverage pretrained models, customize and optimize them, and deploy them to run locally or in the cloud. Fast, powerful, hybrid AI enables AI-powered applications to scale quickly, while delivering the best performance on each system. The RTX AI Toolkit enables you to bring more AI-powered capabilities to more users so they can enjoy the benefits of AI across all of their activities, from gaming to productivity and content creation.

NVIDIA RTX AI Toolkit will be released soon for developer access.

Streamline Development of AI-Powered Apps with NVIDIA RTX AI Toolkit for Windows RTX PCs | NVIDIA Technical Blog (2024)

FAQs

What is nvidia rtx AI? ›

RTX Video is a collection of real-time, AI-based video enhancements — powered by RTX GPUs equipped with AI Tensor Cores — to dramatically improve video quality.

Discover More ›

Why is Nvidia needed for AI? ›

Nvidia's specialization means it is able to charge a premium for its products. In fact, its chips, which are manufactured in Taiwan, are so unique that companies looking to build AI capabilities are complaining that there is a shortage of them.

Discover More ›

What GPU is needed for generative AI? ›

NVIDIA RTX GPUs — capable of running a broad range of applications at the highest performance — unlock the full potential of generative AI on PCs. Tensor Cores in these GPUs dramatically speed AI performance across the most demanding applications for work and play.

Discover More Details ›

Is Nvidia AI free? ›

Kick-start your AI journey with access to NVIDIA AI workflows—for free.

Learn More Now ›

How much does Nvidia AI cost? ›

NVIDIA AI Enterprise is available as a perpetual license at $3,595 per CPU socket. Enterprise Business Standard Support for NVIDIA AI Enterprise is $899 annually per license.

Is generative AI free to use? ›

Explore generative AI in a free Elastic trial. Unlock powerful generative AI (GAI) capabilities using LLMs with a free 14-day trial. Test out ingesting your data, build a proof of concept, and kick the tires on Elastic's machine learning and retrieval augmented generation (RAG) capabilities.

How much GPU do you need for AI? ›

Also keep in mind that a single GPU like the NVIDIA RTX 3090 or A5000 can provide significant performance and may be enough for your application. Having 2, 3, or even 4 GPUs in a workstation can provide a surprising amount of compute capability and may be sufficient for even many large problems.

View Details ›

What hardware is needed for AI? ›

Artificial Intelligence Hardware Components

Cеntral Procеssing Units (CPUs) Evеry computеr systеm is built on cеntral procеssing units (CPUs). ...
Graphics Procеssing Units (GPUs) ...
Tеnsor Procеssing Units (TPUs) ...
Fiеld-Programmablе Gatе Arrays (FPGAs) ...
Mеmory Systеms. ...
Storagе Solutions. ...
Quantum Computing. ...
Edgе AI Hardwarе

More items...

Show Me More ›

Are any AI apps free? ›

Canva opens up design to anyone who uses the free AI tool with its user-friendly platform that guides users to create professional-quality graphics, presentations, and social media content. Its standout AI feature is Magic Studio, which groups all of Canva's AI tools in one place.

What is the name of Nvidia AI software? ›

NVIDIA CUDA-X AI is a complete deep learning software stack for researchers and software developers to build high performance GPU-accelerated applications for conversational AI, recommendation systems and computer vision.

Discover More Details ›

Who competes with Nvidia in AI? ›

Intel, Google, Microsoft, Meta, Cisco, and other tech giants, have announced the formation of the Ultra Accelerator Link (UALink) Promoter Group, a strategic move aimed at curbing Nvidia's dominance in the AI accelerator market.

Streamline Development of AI-Powered Apps with NVIDIA RTX AI Toolkit for Windows RTX PCs | NVIDIA Technical Blog (2024)