How AI Is Learning to Chat Like Us

Sujithra Kathiravan
3 min readJul 18, 2024

--

In recent years, artificial intelligence has made remarkable strides in understanding and generating human language. At the forefront of this revolution are Large Language Models (LLMs), sophisticated AI systems that can engage in human-like text interactions. But how do these models actually work? Let’s unpack the key technologies that make LLMs tick.

The Transformer: A Game-Changing Architecture

Architectures and ideas behind transformer-based generative LLMs

At the heart of modern LLMs lies an innovative architecture called the transformer. Introduced in 2017, transformers revolutionised how AI processes language.

Unlike earlier models that analysed text sequentially, transformers can process entire sentences at once. This parallel processing is achieved through a mechanism called self-attention, which allows the model to weigh the importance of different words simultaneously, regardless of their position in a sentence.

This breakthrough has led to significant improvements in language understanding:

  1. Better grasp of context and nuance
  2. Improved handling of long-range dependencies in text
  3. Faster training and processing times
  4. Ability to scale to massive datasets

These advantages have paved the way for increasingly powerful language models, pushing the boundaries of AI’s linguistic capabilities.

Tokenization: Breaking Language into Buildable Blocks

Tokenizers in LLM

Before an LLM can work its magic, it needs to break down text into manageable pieces. This process is called tokenization.

Modern LLMs often use subword tokenization methods like Byte Pair Encoding (BPE) or WordPiece. These approaches break words into smaller units, allowing models to handle a vast vocabulary while keeping the overall number of tokens manageable.

For example, “unbelievable” might be tokenized as “un-believe-able”. This approach offers several benefits:

  1. Handling of rare words by breaking them into more common subwords
  2. Ability to understand and generate new words from learned subword units
  3. More efficient processing due to a smaller overall vocabulary

The choice of tokenization method can significantly impact an LLM’s performance, influencing its ability to understand context and handle different languages.

Navigating the Unknown: Handling New Words

Despite their vast knowledge, LLMs inevitably encounter words they haven’t seen before. To handle these out-of-vocabulary (OOV) words, LLMs employ several strategies:

  1. Subword tokenization: Breaking unknown words into familiar subwords to infer meaning
  2. Contextual analysis: Using surrounding words to guess the meaning and function of an unknown word
  3. Special tokens: Using a placeholder to represent unknown words while maintaining sentence structure
  4. Dynamic vocabularies: Advanced models can expand their vocabularies on the fly, learning new words as they encounter them

These approaches allow LLMs to gracefully handle new or rare words, maintaining performance even when faced with unfamiliar language.

Looking Ahead..

As LLMs continue to evolve, they’re pushing the boundaries of what’s possible in natural language processing. From powering more intuitive digital assistants to generating human-like text, these models are rapidly changing how we interact with technology.

Understanding the fundamentals behind LLMs — transformers, tokenization, and OOV word handling — gives us valuable insight into the future of AI and human-machine interaction. As these technologies continue to advance, we can look forward to even more impressive feats of artificial linguistic intelligence in the years to come.

--

--

Sujithra Kathiravan

AI/ML and GenAI Enthusiast🤖 || 📈MS Business Analytics @UT Dallas || 🌐 AWS Community Builder || 🚀AWS 2xCertified