The modern way of work shifts more and more to using large language models (LLM) such as OpenAI's GPT-4 or Meta's Llama 3. We incorporate the help of those assistants into our daily routines. This doesn't just cover plain text generation but also helps with structured output in coding tasks.

When it comes to developing systems of any kind based on data stored in a database, it is hard to stay away from Structured Query Language (SQL). SQL is the go-to language for data manipulation and is used in various different dialects such as MySQL, PostgreSQL or Oracle. Every dialect has its nuances and differences from others. This makes the task of validating SQL-queries not as easy as it may seem.
When we, as developers, create SQL-queries, we mostly know the underlying database and the task that we want to fulfill with a SQL query. We have a rough plan in mind of what we need to do to get the output we want. When using LLMs in SQL query generation, we want to get the same results we would come up with, but much faster and out of the box. This leads to some challenges we need to overcome. Let's delve into some common issues that arise when generating SQL statements.
One of the most crucial but challenging tasks in generating SQL queries is understanding the context and schema of the database. Large Language Models are trained on vast amounts of text but don't have specific information about a random database's schema, tables and relationships. In the same way as we as developers need to understand the database we want to query data from, an LLM needs the same information of what tables and columns to get specific data from. The quality of the output depends on how the schema is presented to the LLM. There is also the possibility to finetune an LLM on specific schemas, but that holds the risk of getting too static and generating invalid output again.
Natural languages are inherently ambiguous and context-sensitive. Different users may articulate the same use case in various ways, leading to multiple interpretations. This makes it harder to get a deterministic, perfect outcome for each request given to the LLM. The input might also differ in the proper explanation of the task. This means that a task might not be properly explained, or the connection to the database to be queried cannot be textually presented. This also affects building the context needed to generate the query output.
LLMs are trained on a vast variety of text and examples. One model will produce better output for SQL query generation, another one will perform worse. For most LLMs, we don't know what they were trained with, or even if we know, we hardly keep track of the huge amount of data presented to the models. Nowadays, every single LLM we are using will be able to generate SQL query output. But with this power also come some risks. In the query outcome, there may be references to columns that do not exist in our specific use case. This is a classic example of LLMs hallucinating, preventing us from querying our database for the desired data.
Although LLMs know how to produce SQL queries, it is still possible that syntactical errors are created. This problem might increase, as soon as different SQL dialects are asked to be used. Depending on the LLM used, some dialects might perform better than others. Some might not even be considered, although they are referenced in the input prompt. But if an LLM has a bias to a specific SQL dialect to use, it will tend to generate output using that structure for the SQL output.
Even if the query is syntactically correct, it might not yield the desired results due to logical inconsistencies. For example, incorrect join conditions or where clauses. In the end, the user doesn't get the output that is expected for the use case, although the query can be executed without any errors.
LLMs may produce syntactically correct SQL queries that are, however, not optimized for the performance. Poorly optimized queries can lead to long execution times and heavy loads on database servers. Ensuring that generated queries are optimized requires additional layers of validation and optimization.
If a user follows the path of fine-tuning specific use cases for a given data structure, the maintenance of LLMs for adapting to new patterns, languages and changes can be labor- and cost-intensive. Ensuring the model evolves with changing user requirements and keeps in line with the current database is an ongoing challenge. When it comes to continuous development for fine-tuning, a proper pipeline is needed.
Given the challenges, automating the validation of created queries is very practical and helpful. This is where the Python libraries SQLFluff and SQLGlot come into play. They can help you to validate the generated SQL output in a few easy steps.
At first, let's take a look at the capabilities of the two mentioned libraries:
SQLFluff is a dialect-flexible, configurable SQL linter. It is like a spell checker for SQL queries. It examines SQL code to ensure it follows predefined best practices and standards. It is even capable of auto-fixing some linting errors.
SQLGlot is more like a multilingual translator for SQL. It reads SQL queries and can convert them between different SQL dialects. It parses SQL queries to an abstract level that allows it to easily transpile queries into different dialects. This becomes especially handy for managing database systems with multiple dialects, each with its own syntax nuances.
Large Language Models like GPT-4 or Llama 3 offer a powerful way to generate SQL queries quickly. However, to ensure these queries are efficient, accurate and maintainable, incorporating tools like SQLFluff and SQLGlot is essential. By validating and optimizing LLM-generated SQL, these tools provide a crucial safety net, enhancing the reliability and performance of our database interactions. By integrating SQLFluff and SQLGlot in our workflow we can bring out the best in our LLM-generated SQL queries.

At APICHAP, we are dedicated to not only helping you create APIs quickly but also ensuring that your system remains safe and efficient over time. Our platform offers a complete API lifecycle management service that keeps your system up-to-date and free of technical debt. With features such as auto-monitoring and self-repair, we guarantee that your system remains secure, saving you both time and money.
Instead of incurring significant expenses at irregular intervals for system maintenance, APICHAP offers a convenient monthly license fee. This fee covers all application updates and ongoing monitoring, ensuring your system is always running optimally. Additionally, updating your integrations or building new features is included in your monthly subscription, providing you with comprehensive support and continuous improvements without unexpected costs.
By choosing APICHAP, you can focus on your core business activities while we handle the complexities of API maintenance and updates, ensuring your applications are always performing at their best.

Michael Weißenböck
CTO & Co-Founder @ APICHAP
LinkedInMichael is a passionate AI enthusiast who has dedicated his career to exploring and advancing the field of artificial intelligence. With a strong research background, he has successfully completed numerous AI-related projects, consistently driven by a desire to create real-world applications that improve people's lives.