Martin Pinner
Martin Pinner DBmarlin Co-founder & Chief architect.

The Power of Grouped Statements

The Power of Grouped Statements

Sometimes it is illuminating to be able to group database statements together, so that you can see their combined performance. Viewed separately, their performance may not appear that significant, but when added together, they can appear at the top of charts of resource-consuming statements. One of the most useful ways to group statements is to treat all statements that only differ in their variable values as the same.

In an ideal world, applications would write their statements in such a way that they do not contain literal strings and numbers. Instead, they would use bind variables or placeholders. The benefit of doing this is well known: the database server only has to parse the statement once, and the optimiser can create a single execution plan. After that the server can re-use the plan, saving time and memory. As an aside, there are also some security benefits as well, by limiting the scope for SQL injection.

Consider the following simple statements.

1
2
3
select sum(value) from data where name = 'A'

select sum(value) from data where name = 'B'

The database server will have to parse each statement and produce separate execution plans. These statements could be replaced with the following.

1
select sum(value) from data where name = ?

Here '?' is used as a placeholder or bind variable. This is a common syntax in Java applications. Other placeholder symbols may be used in other environments, such as '$1' or ':1'. When the application calls this statement, it separately passes the value of the bind variable, either 'A' or 'B'.

However, you will find that many applications do not always take this approach and literal values proliferate. In some cases, it is because the developers decided it was not worth the effort, or because the framework they were using did not allow it. But it also may have been a conscious decision. In the example above, what if there were many more rows that matched name = 'A' compared with name = 'B'? This might lead to the optimiser choosing a different execution plan for each case (perhaps a full table scan versus an index scan). But if it only has the placeholder available at parse time then it can only generate a single, generic plan. In most cases the generic plan is probably best. But, if this a key statement that takes a long time, or is run many times, then it may be worth including the literal value in order to get the best performance out of both scenarios.

So how do we decide whether to use bind variables or not? We need to turn to our performance monitoring tools to help guide us. The ability to group statements together can really help here.

The screenshot below is taken from DBmarlin and shows the top ungrouped statements ranked by wait time from our production MySQL installation of WebTuna (see www.webtuna.com).

DBmarlin Top Statements Ungrouped

There was a total of 396 statements recorded during a 2-hour timeframe from 10:00 to 12:00 on 26 January 2022. The pie charts show the relative weight of each statement by duration and what they were collectively waiting on. The table below shows the statements ranked by total wait time. Given this information you might decide that you should focus your attention on the worst-performing statement in the list (select sla, customer_id…), which took a total of 4 minutes and 8 seconds.

But now let us look with similar statements grouped together. DBmarlin can create a grouped version of a set of statements by removing string, numeric and date literals, as if bind variables had been used.

DBmarlin Top Statements Grouped

There is now a new top statement (call insert_hit…). It took a total of 5 minutes 25 seconds. The previous top statement has been relegated to number two, still at 4 minutes 8 seconds. So where did call insert_hit come from? It wasn’t even on the first page in the list of statements. As you can guess, those question marks have replaced literal values. In fact, the highest original entry was on page two. We can also see that the total number of statements reduced from 396 to 44 with grouping enabled, a ratio of 9:1.

You might be wondering why we used literal values in the procedure call. The answer is we don’t - we use bind variables but for some reason the MySQL driver substitutes them back in.

If we click on the statement, we can see its performance profile.

DBmarlin Statement Performance Over Time

If you look carefully, you can see that the one grouped statement mapped to 325 individual statements during the period. If you wish, you can see them all in in the Statements tab. Of more interest are the wait events. The top graph shows the response time over time broken down by wait events. The pattern is fairly consistent. The table in the Waits tab below shows that the top wait was checking permissions. Wait-event analysis often comes up with surprising findings; I wouldn’t have thought of that. Clicking the Read More button takes you to the DBmarlin knowledge base.

DBmarlin Knowledge-base MySQL Checking Permissions

The description is taken from the MySQL official documentation: “The thread is checking whether the server has the required privileges to execute the statement”. MySQL is not giving too much away here but the gist is clear. We will continue to add more information to each wait-event entry as we go along. Let us know if you have any suggestions.

To take this analysis further requires knowledge of the application. Currently, WebTuna is performing adequately so there are no plans to change it. But the fact that the procedure uses literal values and spends most of its time checking permissions suggests that it might be worth implementing the functionality another way.

Ready to try DBmarlin?

If you would like to find out more about DBmarlin and why we think it is special, try one of the links below.

dbmarlin-trial-offer