Detecting duplicate SQL queries in Django

2014-06-12 by Senko Rašić

Django’s ORM is a very powerful tool, and possibly the most complex part of the entire framework. The ORM makes it easy to do quite complex operations on the data without worrying about SQL syntax details, but can be dangerous (in terms of performance) unless you’re familiar with what the end result (the actual database queries) will do when you use various features of the ORM.

Here’s a trivial example:

  class Author(models.Model):
    name = models.CharField(max_length=100)
  class Book(models.Model):
    author = models.ForeignKey(Author, related_name='books')
    title = models.CharField(max_length=100)

  def myview(request):
    return render(request, 'template.txt', {
      'all_authors': Author.objects.all()


  {% for author in all_authors %}
    Name: {{ }}
    {% for book in author.books.all %}
      Title: {{ book.title }}
    {% endfor %}
  {% endfor %}

Experienced Django users will immediately notice the problem – each time we want to display book title, Django will make another SQL query to the database to get the book details, so the total number of queries will be one for all the authors, and then one for each of the author’s books. If there’s 10,000 books in the database, that will take 10,001 SQL queries. Clearly, not an optimal solution.

Django does make it easier to do the right thing here: instead of Author.objects.all(), we can use Author.objects.all().prefetch_related('books'), which will load all the related books’ details in a single (additional) SQL query, so the total number of SQL queries will be only 2, no matter how much books there are. This isn’t turned on by default, though (for good reasons), so you have to remember when, and how, to use it.

In this trivial example, it was easy to find the problem. In real world applications, where you maybe have complicated QuerySets that you filter on and pass around, it may be harder to track what exactly needs to be preselected or prefetched. The only clue you have is the total number of queries. If it’s a small number, you’re probably ok. If it’s a large number, you’re probably forgetting something. But unless you’re testing with the same database sizes as in production, it’s easy to miss these problems – in our example, if you only had one book and one author, you’d never notice it.

To help detecting such cases, here at Good Code we built a piece of Django middleware that logs and inspects SQL queries for each request, and can detect non-optimal queries. Although it’s impossible to do 100% accurate detection, the heuristics for the detection handle the most common cases well.

Here’s the example output:

[SQL] 17 queries (4 duplicates), 34 ms SQL time, 243 ms total request time
[SQL] repeated query (4x): SELECT "customer_role"."id",
 "customer_role"."contact_id", "customer_role"."name"
 FROM "customer_role" WHERE "customer_role"."contact_id" = ?
 File "/vagrant/api/", line 178, in get
   return self.serialize(self.object_qs)
 File "/vagrant/customer/", line 131, in serialize
   return serialize(objs, include=includes)
 File "/vagrant/customer/", line 258, in serialize_contact
   lambda obj: [ for r in obj.roles.all()]),
 File "/vagrant/customer/", line 258, in <lambda>
   lambda obj: [ for r in obj.roles.all()]),

The middleware can also add statistics to response headers, as in this example:

X-QueryInspect-Num-SQL-Queries: 17
X-QueryInspect-Duplicate-SQL-Queries: 4
X-QueryInspect-Total-SQL-Time: 34 ms
X-QueryInspect-Total-Request-Time: 243 ms

The middleware works on all recent versions of Django (1.4+) and Python (2.x and 3.x), it’s easy to use (install via pip, and adjust a few Django settings), and since it only works in debug mode, there’s no fear of it accidentally leaking data in production. For more information and documentation, head on to django-queryinspect repo on GitHub.

Senko Rašić
We’re small, experienced and passionate team of web developers, doing custom app development and web consulting.