Setting the Stage with AI Infrastructure (2024)

Setting the Stage with AI Infrastructure (1)

AI infrastructure encompasses the foundational systems, tools, and frameworks that support the development, deployment, and operation of AI applications.

The new age of infra differs from traditional software in that before, software and data were for the most part separated. ML systems, though, are part code, part data, and part artifacts created from the two. The trend in the last decade shows that applications developed with the most/best data win. Instead of focusing on improving ML algorithms, most companies will focus on improving their data—many companies are still sitting upon a treasure trove of data while others are risking lawsuits to train their models on more widely available copyrighted data. Because data can change quickly, ML applications need to be adaptive to the changing environment which might require faster development and deployment cycles.

In traditional SWE, you only need to focus on testing and versioning your code. With ML, we have to test and version our data too, and that’s the hard part. Now, the pros of this revolution are that data is more accessible, AI/ML is advancing rapidly, open source offers new ways to build community and distribute products, and security is now a top-of-mind concern. Now, we just need to build AI systems that are reliable, scalable, maintainable, and adaptable.

AI Hardware

The reason NVIDIA has grown to become a $3 Trillion company is because it has provided a comprehensive solution from the chip all the way to the data centers, also partnering with the main foundation model labs like OpenAI to allow them to set up AI factories. In mid-March, NVIDIA demoed Blackwell, a powerful new GPU designed to run real-time generative AI on trillion-parameter large language models (LLMs), and Nvidia Inference Microservices (NIM), a software package to optimize inference for dozens of popular AI models. Yet, there is still a widespread GPU shortage and AI still has a scaling problem. Throwing more GPUs at an AI workload won’t work in the long run — choosing the right hardware, with the optimal parallelism, memory access, and specialized operations — can be the differentiator in an AI model training for weeks or a few hours.

Thus, there’s opportunity for workload specific AI accelerators - GPUs containing tensor cores which are intended to speed up the training of neural networks, FPGAs which are reconfigurable devices, and ASICs which optimize memory use and the use of lower precision arithmetic to accelerate calculation and increase throughput of computation. New forms of non-volatile memory tend to fall between traditional DRAM for memory (higher density) and traditional NAND flash for storage (better power consumption). Efficient interconnects can also help reduce bottlenecks. Memory and storage tends to be standardized, but compute and AI algorithms vary significantly across verticals, so companies must work with partners to develop specific hardware solutions.

Data Infrastructure and Inference on the Edge
See Also
Convert milliliter [ml] to dessertspoon (US) [cochl. med.] • Volume and Common Cooking Measurement Converter • Common Unit Converters • Compact Calculator • Online Unit Converters

Key infrastructure players like Confluent, Databricks, Mongo, and Snowflake will continue to evolve as new players like Hightouch and Redis also enter the space and we see an increased propensity for bringing in unstructured data to ML systems.

We will also continue to see a trend towards running LLMs on the edge for its many benefits:

Reduced latency - Edge computing minimizes the distance data needs to travel, leading to faster response times and real-time insights. This is crucial for applications like autonomous vehicles, real-time voice applications, robotics, and VR.
Improved privacy and security - Sensitive information doesn't need to be sent to the cloud hence does not involve associated risks.
Offline functionality – can still function even when there's no internet connection, which is ideal for critical infrastructure.

How do we solve the hardware limitations in processing power and memory, energy consumption, and limited update/maintenance though? Model Quantization is one such technique that trades off data retention and processing power. Memory is significantly lacking in edge devices, as the NVIDIA Jetson Orin Nano, with its 8GB DRAM, can’t even accommodate the most compact LLaMA-2 model in half precision. The inference time of edge LLMs is bottlenecked by the generation stage (not summarization), which is bounded by memory bandwidth.Furthermore, cost of loading LLM weights on the edge surpasses that of loading activations. Thus, the goal is to minimize the data transfer between off-chip and on-chip memory, and activation-aware weight-only quantization (AWQ) and SmoothQuant have enabled creations of edge LLMs like TinyChat.

A few days ago, Google also released its AI Edge Torch Generative API that allows developers to bring new capabilities on-device. As we see more optimization improvements that enable mobile genAI applications, we will start to see a boom in applications much like when mobile was first invented.

Developer Tools, APIs, and Open Source

Developer productivity has flourished in the age of Devins and open source packages.

Front end is the first to get automated, as we see entire interfaces generated and optimized by AI i.e. Coframe and their Coffee product. Software testing and red teaming is also increasingly being tackled by companies like TestRigor, Sauce Labs, and Tricentis. First, we start off with copilots like Cursor that is improving by the day based on more and more training data. Then, autopilots like Cognition emerge and research labs teaching LLMs to reason and code allow entire APIs to be generated from thin air and give us and our AI agents the ability to interact with all sorts of applications. Non-technical users can also begin to interact with data using purely natural language—the new CRMs will look quite different as it auto-updates and fetches relevant information automatically.

While open-source projects are hard to grow and monetize, the benefits are abound, and players like Together, Mistral, Runway, Stability, H2O, and OpenNN are making big bets in this space.

Security

Data exfiltration, ransomware, and other threats pose the biggest risk to modern companies, and they know this. It’s a hard space to crack as attackers get more creative by the day and competing in this space requires an ever-evolving strategy and comprehensive solutions. Since the product sold is “security,” new players also have trouble entering since entrusting security to a new startup versus an encumbent with a solid track record makes little sense. Security is one of those noisy, crowded, and confusing spaces for buyers. Having spent the past few months deep in the fintech space where security is a top, top concern (think all the bank account information and KYC data plus noncompliance that could entail millions in fees), it’s apparent that there’s no shortage of players offering security automation to offset risks of bringing in new customers and technology. In the age of LLMs, we now also need to worry about things like prompt injection attacks, prompt leakage, deepfake images/videos/AI voice calls pretending to be someone else. Until secure infrastructure catches up to the pace of innovation, many roles ripe for automation will have to wait even if the productivity gains are phenomenal, for no one wants to take on that amount of risk. We see many more mature players in the security space looking to make a difference—companies like Axonius in asset management, Cyera in data context, Corelight in network visibility, Chainguard in open source safeguarding, and Wiz in cloud security. Although many startups will be cannibalized by the next generation of OpenAI models, security and compliance will continue to be a growing field with the need for more experts and systems built around key problems and vulnerabilities.

Setting the Stage with AI Infrastructure (2024)

AI Hardware

Data Infrastructure and Inference on the EdgeSee AlsoConvert milliliter [ml] to dessertspoon (US) [cochl. med.] • Volume and Common Cooking Measurement Converter • Common Unit Converters • Compact Calculator • Online Unit Converters

Developer Tools, APIs, and Open Source

Security

References

Data Infrastructure and Inference on the Edge
See Also
Convert milliliter [ml] to dessertspoon (US) [cochl. med.] • Volume and Common Cooking Measurement Converter • Common Unit Converters • Compact Calculator • Online Unit Converters