SQL Optimization Guide

This document discusses SQL optimization techniques.

Several best practice tips have been created as a response to customer experience feedback. Use this guide to decide where to start optimizing.

Optimization issues

SQL performance can be optimized by following good practice in the design of databases, formulation of SQL queries, use of indexes and other techniques.

Transient tables

Transient tables are created by the database engine to compute intermediate results.

Transient tables are stored on disk by default, which may result in unexpected I/O calls that will probably have a negative impact on database performance.

Prevent datafile corruption

Datafile corruption is corruption of the file containing a database with invalid data.

Datafile corruption can occur after a system crash or loss of power. SQLite maintains files called rollback journals from which the last valid state of a database can be reconstructed. However, if a rollback journal is removed or renamed during recovery from a crash, for instance in the course of application recovery the database will remain corrupted.

SQL index tips

Indexes are added either manually or automatically to columns of a data table to speed up SELECT operations.

Indexes greatly speed up the performance of SELECT statements but can impact on insertions, updates and deletions and memory usage. It is not efficient to index all columns regardless of their purpose and under certain circumstances indexes decrease performance. They are best added to a database at the last stage of development.

SQL insertion tips

INSERT, like UPDATE and DELETE, is an inherently fast operation but it can execute slowly if each individual operation is wrapped in a transaction, which is the default behavior.

Multiple insertion statements should be grouped into transactions by explicit use of the COMMIT statement or else by use of TEMP tables which can be inserted into the database as a batch in a single operation, and by reuse of prepared statements. Insert statements should use bound parameters for reasons of efficiency and to prevent SQL injection attacks.

SQL schema tips

A database schema is a definition of its structure including the tables, data types and keys.

The design of a database can impact on the efficiency of the SQL queries run on it. Correct ordering of columns in a table and use of keys makes a database more efficient, while it is inefficient to store numerous large data structures such as BLOBs directly in a database. Schemas are best kept small.

SQL expressions

An expression is the part of an SQL statement which uses operators to combine values to form new values.

Some ways of writing an expression are more efficient than others because of the way the SQLite optimizer rewrites operators. It is also important to use constant subexpressions within subqueries.

SQL statement tips

Certain SQL statements have special case optimizations.

There are special rules for using the MIN and MAX functions, the UNION and UNION ALL operators and OFFSET clauses efficiently within SELECT statements.

SQL joins

SQL joins are used to combine multiple tables into virtual tables.

A join on multiple tables involves reordering their columns to construct the combined virtual table. The optimizer attempts to perform this ordering in the most efficient way but is not always successful. You can override the optimizer by using the CROSS JOIN operator.

ANALYZE command

The ANALYZE command is used to provide statistics about the distribution of values in a populated database.

The ANALYZE command creates a table SQLITE_STAT1 containing the statistical profile of the database at the time when it was run. The optimizer uses that data to choose between implementation options. Developers can also use the data to make their own optimizations and they can modify the table to influence the behavior of the optimizer.

SQL WHERE clause tips

The SQL WHERE clause restricts the effect of an SQL statement to rows satisfying the expression which it contains.

The performance of a WHERE clause can be improved by following certain rules. It is inefficient to use expressions within WHERE clauses instead of indexed column names. Also, the AND operator should always be preferred to the OR operator.

Related concepts