ICL: In-Context Learning (Part 1)

AI has exploded in the last two years, largely due to a massive leap in the capabilities of Language Models (LMs). We’ve seen models grow from 300 million parameters to hundreds, even thousands, of billions. AI giants like OpenAI, Anthropic, and Google are constantly pushing boundaries, releasing powerful models which have undergone pre-training on vast datasets of text and code.

Due to the powerful nature of the models, we can now use them for various tasks, with very minimal context. This is where In-Context Learning (ICL) comes in. It’s a very useful way we can use to interact with these Large Language Models (LLMs)/ Infact, You might have already used ICL when prompting these models in some way while guiding them towards your desired response. Unlike traditional fine-tuning, which involved extensive retraining, ICL lets us shape the model’s behavior by providing instructions and examples directly within the input prompt.

This is the first in a three-part blog series exploring In-Context Learning. We’ll deconstruct ICL in this post, look at its advancements in the second, and finally explore future research directions in the third.

Deconstructing In-Context Learning

At its core, ICL exhibits characteristics of meta-learning [1], where the model appears to have implicitly acquired a learning algorithm during pre-training. This algorithm enables it to adapt to new tasks based on limited in-context examples or instructions. This differs significantly from conventional fine-tuning, where the model’s parameters are explicitly updated through gradient descent to minimize a task-specific loss. In ICL, the model leverages its internal representations to learn from the provided context without parameter updates.

In-context learning effectively hinges on the rich representations learned during pre-training, which encode not only linguistic knowledge but capture intricate relationships and patterns in the data. These learnings allow them to recognize task structures and generalize to unseen inputs when presented with a few demonstrations.

Types of Context: More than Just Showing Examples

ICL is more than just providing a few examples [2,3]. It’s about crafting the right kind of context to guide the LLM. Here are a few ways to do that:

Demonstration Context

This is the most common form of ICL, often referred to as “few-shot learning.” We simply provide the LLM with a few examples of input-output pairs, with or without instructions. It’s very similar to how we learn solve patterns like Toyota : Japan:: Tesla:? .A straight forward through a prompt could be:

Translate the following English sentences into French:

English:  "This croissant is delicious! Where did you get it?"
French: "Ce croissant est délicieux ! Où l'avez-vous acheté ?"

English: I would like to order a coffee.
French:

Instructional Context

Instead of just showing examples, you can also instruct the language model.You provide explicit instructions or guidelines on how to perform the task. A lot of recent research has shown that providing reasoning steps through instructions, known as ‘chain-of-thought prompting’ [4, 5], can significantly enhance the reasoning abilities of LLMs. For example, you might provide a set of rules for summarizing an article or generating a poem.

Priming Context

This is like setting the mood. You provide text that is related to the task or domain, even if it doesn’t explicitly demonstrate the task. This can help “prime” the model to generate more relevant and coherent responses. For example, providing a news article about climate change before asking it to write an essay on the topic.

Hybrid Context

Hybrid context can be a combination of different types of context, such as demonstrations and instructions, to provide a richer learning signal for the model. This could involve providing examples of product descriptions along with guidelines on what information to include.

Next Steps

ICL has evolved a lot more than prompting strategies and is an active topic of research in Machine Learning. In the next blog post, we will look at some of how ICL can enhance zero-shot classification using LLMs.

References:

[1] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners.

[2] Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., & Zettlemoyer, L. (2022). Rethinking the role of demonstrations: What makes in-context learning work?

[3] Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing

[4] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q. V., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models.

[5] Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners.