Prompt Engineering Best Practices

Techniques for effective prompt engineering

Prompt Engineering Best Practices

Starting from scratch

What is it?

Strategies and tactics for developing and optimizing prompts to efficiejntly use LMs. These can help better understand the capabilities and limitations of LLMs

##Introduction ##LLM Settings By setting different params on the LLM interaction we are going to be able to obtain different results from the prompts. These can as well, improve the reliability and desirability of responses. Common settings are:

Temperature: TL;DR The lower the temperature, the more deterministic the results. The higher the temperature, the more random and diverse outputs obtained. With temperature increment we are basically increasing the weights of other possible tokens. In other words, lower temperatures should be used for more deterministic tasks like QAing and higher temperatures for more random ones, like writing a poem
Top P: It is a samplig technique used along temperature. For more exact answers (deterministic), top P should be low. For creative ones, keep it high. Basically, top P choooses the tokens comprising the top P probability. So a low value will get more confident responses that a high one. Consideration: Alter temperature or Top P but not both
Max Length: Manage the number of tokens a model generates. Prevents long or irrelevant responses (controls costs)
Stop Sequences: String that stops the model from generating tokens. Another way to control length and structure of a model's response.
Frequency Penalty: Penalty of next token is proportional to how many times that token has already appeared in the response. The higher, the less likely a word will appear again.
Presence Penalty: Applies a penalty on repeated tokens but unlike Frequency Penalty, this penalty is the same for all repeated tokens, it does not matter if they appear 2 times or 10 times.

Basics of prompt

Simple prompts can achieve a lot, it is the quality of the results what matters. Which in the end, depends on how much information is provided to the prompt and how well crafted it is.

General Tips for Designing Prompts

Start simple: It is an iterative process and requires a lot of experimentation in order to get optimal results. Start with a plain prompt and keep adding context based on the answer obtained.
Start from zero shot, then few shot then fine tune
The instruction: Effective prompts can be written just by instructing what command to use "Write", "Classify",.... Adding the instruction to the head of the prompt can be helpful Example ```bash Summarize the text below as a bullet point list of the most important points.

Text: """ text input here """

- Specifity: The more description and detailed on the prompt with regards to the task to be developed, the better results. Important when there is a required or desired output format. Take into account rthat including too much context is not necessarily good.
- Avoid Impreciseness: It is often better ot be specific and direct rather than offer too much context on a prompt. The more direct, the more effective message gets accross.
- To do or not to do?: Antoher tip would be to say what to do but also, and more important, saying what NOT to do. This fact encourages the model to be more specific and focus on generating a good response
- Articulate the desired output format with examples: Show and tell how data should be formatted
- Reduce fluffly and imprecise descriptions
- On code generation: Use leading words to nudge the model to certain patterns

## Techniques
### Zero-prompting
The prompt used to interact with the model won't contain any examples or demonstrations. It just instructs the model to perform any task without additional examples.

### Few-Shot prompting
It enhances the capabilities of zero-shot prompting by providing the LLM with few "shots" (examples) of the expected behaviour to obtain better performance.
However, while few-shot prompting may work well for certain tasks, more complex ones, that require certain level of reasoning are not completed correct.
When zero-prompting and few-shot prompting are not enough, that could indicate that whatever was learned by the model was simply not enough to perform the task.

### Chain of Thought Prompting (CoT)
Enables complex reasoning capabilities through intermediate reasoning steps. Additonaly, it can be combined with few-shot prompting for better results.
 - Zero-shot CoT
    Instead of providing with the reasoning steps, walk the model through those steps byt saying something like "let's think step by step". This can be particularly useful when there are not too many examples to use in the prompt
 - Automatic CoT
    Applying CoT demonstrations has proven useful but the process involves hand-crafting the examples, which could lead to suboptimal solutions. To solve this a new approach for leveraing LLMs with "Let's think step by step" can be taken in order to generate reasoning chains for desmonstration. This practice mitigated the effect of mistakes while increasing the variety of demonstrations.

    Automatic CoT consists of:
      - Question clustering: Partition questions of a given dataset into clusters
      - Demonstration sampling: Select aa representative question from each cluster and generate a Zero CoT reasoning chain.
### Self-Consistency
This technique involves sampling multiple diverse reasoning paths byh few-shot CoT, using such generations to select the most consistent answer.

### Generate Knowledge Prompting
Incorporate knowledge or info to help the model make more accurate predictions. This technique aims to generate such knowledge in the prompt before the LLM is able to answer.

Basically it is generating knowledge from a LM to then, provide that same knowledge to the model as additional input when answering a question

### Prompt Chaining
This technique involves using a series of prompts to guide the model towards a desired output. The prompts are designed to be specific and focused on the task at hand, while also providing enough context to allow the model to understand the problem and generate a relevant response.

One use case for this technique could be to answer question about a large document. Where the first prompt could extract relevant information from the document, the second prompt could then use that information to generate a more detailed response.

### Tree of thoughts
This technique is used overall on taskstaht requre a little bit explorartion or strategic lookahead. The way of guiding this model is to encourage exploration over thoughts that serve as intermediate steps, kind of a chain-of-thought approach mixed with prompt chaining.

The model ability to generate and evalute such thoughts is then combined with different search algoritms to enable system exploration of thoughts.

### Retrieval Augmented Generation
This technique involves using a retrieval system to retrieve relevant information from a large corpus of text or external knowledge sources. The model then uses this information to generate a response that is relevant to the question.

Retrieval Augmented Generation is particularly useful for tasks that require a lot of contextual information, such as answering questions about a large document or a specific topic.

RAG combies an information retrieval component with a text generator model. This technique allows for fine tunning in an efficient manner without needing to retrain the model from scratch.

It takes an input and receives a set of supporting documents, which are concatenates to as context to the input prompt.

The knowledge passed is done through vector embeddings, so knowledge must be processed beforehand.

We could have different types of RAG:
- Naive RAG: The model is given a set of documents and asked to generate a response based on the input prompt. The documents are not used to guide the model's response, but are instead used as context to the input prompt.
- Advance RAG: The data passed as context is not passed straight away but processed through a pre-trained language model. This allows the model to understand the context and generate a more relevant response. The pre-retrieval process involves data indexing whihc aims to enhnace data quality. Post-retrieval optimization focuses on avoiding context window limitsa nd dealing with desitracting information. A commont approach would be re-ranking, which could involve approaches like moving relevant context to the edges.
- Modular RAG: The Modular RAG framework introduces additional specialized components to enhance retrieval and processing capabilities. Like a search module, adpting to specific scenarios that enable direct searches across various data sources, and a knowledge module, which allows the model to process and understand the knowledge passed as context. Both naive and advanced RAG are special cases of modular RAG.

In the context of RAG, retrieval is crucial to efficiently reetrieve relevant douments. But there are several key issues involved in the process:
- Retrieval Source: RAG relies on external knowledge to enhance the LLM, the type of the retrieval source as well as it's granularity, are crucial. Data can come unstructured (just a stream of text), semi structured (PDF like, containing tables and images), structured (like a database or KG) or be LLM generated (better aligner with pre-training objectives).
- Retrieval granularity: Coarse-grained retrieval units theoretically should give more relevan information but could also return reduundant content, thus distracting the model . On the other hand, a more fine-grained granularity increases the burder on the retrieval while not fully guaranteeing the relevance of the information.
- Retrieval Algorithm: The retrieval algorithm is the most important part of the RAG framework. It determines how the data is retrieved and processed. The most common algorithms are:
  - Vector-based: This algorithm uses vector embeddings to represent the data and retrieve relevant documents based on the similarity between the vectors. This is a good choice for structured data like a database or KG.
  - Semantic-based: This algorithm uses semantic similarity to retrieve relevant documents based on the meaning of the data. This is a good choice for unstructured data like a stream of text.
  - Content-based: This algorithm uses content-based similarity to retrieve relevant documents based on the content of the data. This is a good choice for unstructured data like a stream of text.

### Automatic Reasoning and Tool-use(ART)
Comnies CoT prompting and tools. It works as follows:
- Given a new task, the model retrieves a demonstration of multi-step reasoning and tool to use of similar tasks from a task library.
- At test, it pauses generation when external tools are called and integrate their output before resuming genration.

ART encourages the model to decompose the task and use tool in the appropriate places in a zero-shot fasion.

### Automatic Prompt Engineer(APE)
It is a framework for automatic instruction generation and selection.
The first step involves a LLM which is given and output of demonstrations to generate instruction candidates for a task.
These approach has proven to be substantially more effectionve than the CoT approach.

### Active Prompting
This technique involves using a model to generate a prompt that is tailored to the specific task at hand. The model is trained on a large dataset of prompts and responses, and it learns to generate prompts that are relevant and effective for the task.

Active Prompting is particularly useful for tasks that require a lot of contextual information, such as answering questions about a large document or a specific topic.

The model is given a task and is asked to generate a prompt that is tailored to the specific task. The prompt is then used to interact with the model and generate a response.

### Directional Stimulus Prompting
Proposes a promting technique fro better guiding the LLM to generating the desired summary.
Using an auxility tunable policy model, a directional stimulus prompt is generated for each input instance. This generates instance-specific hints and clues to guide the LLM in geneating the desirect outcode (ex: including specifc keywords on the genreated summary).

### Program-Aided Language Models (PALMs)\
Method that uses LLMs to read NL problems and genreate programs as intermediate steps. Differs from CoT in the sense that instead of using free-from text it offloads teh solution step to a programmatic language, such as Python.
### ReAct Prompting
LLMs uswd to generate reasoning traces and task-specific actions (ex: tool calling).
It is a paradigm that generate verbal reasoning traces and actions for a tasks, allowing the system to perform dynamic reasoning to created, maintain and adjust plants for acting, while also enabling interacton with texternal envs (ex: wikipedia).
Additionally, reasoning in the fashion of CoT can also be added in order to provide extra context.
Example:
```bash
Question What is the elevation range for the area that the eastern sector of the
Colorado orogeny extends into?
Thought 1 I need to search Colorado orogeny, find the area that the eastern sector
of the Colorado orogeny extends into, then find the elevation range of the
area.
Action 1 Search[Colorado orogeny]
Observation 1 The Colorado orogeny was an episode of mountain building (an orogeny) in
Colorado and surrounding areas.
Thought 2 It does not mention the eastern sector. So I need to look up eastern
sector.
Action 2 Lookup[eastern sector]
Observation 2 (Result 1 / 1) The eastern sector extends into the High Plains and is called
the Central Plains orogeny.
Thought 3 The eastern sector of Colorado orogeny extends into the High Plains. So I
need to search High Plains and find its elevation range.
Action 3 Search[High Plains]
Observation 3 High Plains refers to one of two distinct land regions
Thought 4 I need to instead search High Plains (United States).
Action 4 Search[High Plains (United States)]
Observation 4 The High Plains are a subregion of the Great Plains. From east to west, the
High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130
m).[3]
Thought 5 High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer
is 1,800 to 7,000 ft.
Action 5 Finish[1,800 to 7,000 ft]
...

Reflexion

Reinforce language based agents though linguistic feedback. Reflexion converts feedback from env into linguistic feedback (self-reflection), which is provided as context for the LLM and guide the agent towards the desired goal. This helps the agent learn from previous mistakes leading to performance improvements. There are 3 main components of Reflexion:

Actor: Generates text and actions abseed on state observations. Each actor takes and action in an env and receives an observation that results in a trajectory. CoT and ReAct are used as Models
Evaluator: Score the outputs produced byt the Actor and porduce a reward score.
Self Reflextion: Genrate berbal reiniforcement to assist te Actor in self-improvement. It is geneated by and LLM and provides feedback for future trials.

It is effective for: Sequential decision making, reasoning, and programming. However, it poses some limitations on long-term memory, code generation and self-evaluation capabilities.

Multimodal CoT Prompting

Incorporate text and vision into a two stage framework, involvves rational generation based on multimodal infomation

Graph Prompting

Graph neural networks (GNNs) are a type of neural network that are designed to process and analyze graphs. They are particularly useful for tasks that involve analyzing and understanding complex relationships between data points. Each node receives and aggreagates messagges from its neighbors, and the final output is a summary of the information contained in the graph.

Links

Prompt Engineering Open AI - Best Practices for prompt eng