Weekly AI reading and watch list – October 25th, 2024
Fascinating AI articles, papers, and books I discovered this week
Every Friday, I aim to share a list of AI papers, blog articles, books, and videos that I find worth reading or watching. While many will be recent, I'll also include older but equally, if not more, significant works.
AI Snake Oil
The new book ”AI Snake Oil” by Arvind Narayanan and Sayash Kapoor critiques exaggerated claims surrounding AI, especially tools marketed for complex societal areas like healthcare and criminal justice. The authors highlight that AI does have real uses, while many applications are unreliable and often unethical. They caution against "snake oil" AI products and emphasize understanding AI’s limits, especially as it’s increasingly controlled by big tech.
Books
Scientific Progress in Artificial Intelligence: History, Status, and Futures
Scientific Progress in Artificial Intelligence: History, Status, and Futures by Eric Horvitz and Tom M. Mitchell provides a comprehensive look at AI’s evolution, exploring its historical origins, breakthrough technological advancements, and the directions it is likely to take in the future.
Book chapter
How to improve transformer architectures in terms of computational costs and performance ?
Papers
The paper What Matters in Transformers? Not All Attention is Needed by He, Sun, Shen, and Li (2024) examines the redundancy in large transformer-based models, particularly in attention layers, which are typically considered central to transformer architectures. By using a similarity-based metric to assess layer redundancy, the authors found that many attention layers produce outputs overly similar to their inputs, suggesting they can be pruned without sacrificing model performance. Experimental results demonstrate that models like Llama-3-70B can retain similar performance even after pruning half of the attention layers, offering a promising approach to improve efficiency in real-world applications. They propose an additional method that prunes both attention and MLP layers jointly, which reduces memory and computational costs even further.
The paper Differential Transformer by Ye et al. introduces a novel approach to Transformers called the Differential Transformer (Diff Transformer). This model tackles the challenge of "attention noise," where irrelevant context distracts standard Transformers, reducing accuracy in tasks like key information retrieval and long-context modeling. The Diff Transformer addresses this by calculating attention as the difference between two softmax attention maps, effectively canceling out noise and promoting attention to relevant information.
In experiments, Diff Transformers showed improvements over traditional Transformers across multiple tasks, including language modeling, hallucination mitigation, and in-context learning. These models also demonstrated enhanced ability for "needle-in-a-haystack" tasks, where precise retrieval of key information in large datasets is needed. Additionally, Diff Transformers require fewer resources—only about 65% of the model size or training tokens needed by standard Transformers—making them more efficient while maintaining high performance in tasks involving long or complex contexts.
Ye, T. et al.; Differential Transformer; Oct 7th, 2024; https://arxiv.org/abs/2410.05258
Further work
Papers
The paper Agentic Information Retrieval, authored by Weinan Zhang and colleagues, introduces the concept of "Agentic IR," a novel approach in information retrieval that leverages the capabilities of large language models (LLMs) to transcend traditional methods. Unlike conventional systems that rely on filtering static sets of content, Agentic IR suggests using LLM agents to actively seek, retrieve, and even create information that users need, thereby enhancing adaptability and utility in digital ecosystems. The paper discusses applications of this approach, emphasizing its potential in revolutionizing information retrieval and becoming a key component in future digital products.
Zhang, W. et al.; Agentic Information Retrieval; Oct 13th, 2024; https://arxiv.org/abs/2410.09713
The paper What Emergence Can Possibly Mean by Sean M. Carroll and Achyuth Parola explores the philosophical concept of emergence within dynamic systems. It analyzes how systems composed of multiple components evolve over time, focusing on the relationships between higher-level and lower-level behaviors without strictly defining higher-level phenomena as "novel" or "unexpected." Carroll and Parola offer a classification system for various types of emergence, considering both scenarios that introduce new ontological elements and those that do not. This paper is aimed at deepening the understanding of emergence and how it may help explain complex systems within physics and philosophy.
Caroll. S.M.; What Emergence Can Possibly Mean, Oct 20th, 2024; https://arxiv.org/abs/2410.15468