Aggregate Functions by varunraiko

Introduction

Most of the current databases like Oracle, PostgreSQL, HSQL support the notion of CREATING AGGREGATE FUNCTIONS . This notion is missing from MariaDB, so this project deals with creating the support for aggregate functions for MariaDB. To incorporate the support for the aggregate stored functions first we had to form the syntax for these functions and then a design plan on how we are going to implement these functions.

SYNTAX OF AGGREGATE FUNCTIONS

Well now lets see some examples to understand how these functions work.

CREATE AGGREGATE FUNCTION function_name (parameters) RETURNS return_type  
BEGIN  
      All types of declarations  
      DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN return_val;  
      LOOP  
           FETCH GROUP NEXT ROW;  
           Some instructions  
      END LOOP; 
END|

This would be the syntax and structure for the aggregate functions.
FETCH GROUP NEXT ROW is the essential instruction for the aggregate. It fetches the next row of the table. It is by this instruction the values to the parameters are set.

EXAMPLES OF AGGREGATE FUNCTIONS

So let us say we take an example where we have a table which contains details about a student and his/her marks in two subjects( Subject A and Subject B).
Our table MARKS would have the fields ( Student, Roll Number, Marks in Subject A, Marks in Subject B) The syntax would be:

CREATE TABLE MARKS
(
    Roll_No INT NOT NULL,
    Name varchar(255),
    Marks_Subject_A DOUBLE,
    Marks_Subject_B DOUBLE,
    PRIMARY KEY (Roll_No)
)

So we need to find the answer to few queries

Find the number of students in the class.
For this query , we can create an aggregate function to count the number of entries in the table.
Let us name the function , aggregate_count and it accepts one parameter that is Roll Number.

CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT  
BEGIN  
      DECLARE  count_students INT DEFAULT 0;  
      DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students;       
      LOOP  
          FETCH GROUP NEXT ROW;  
          IF x  THEN  
               SET count_students = count_students+1;  
          END IF;  
      END LOOP;  
END|

The select query would be: SELECT aggregate_count(Roll_No) from MARKS;

Find the total marks scored by all students in Subject A
For this query we can create an aggregate function to calculate the total sum of marks of all the student for the Subject A.
Let us name the function total_sum, with parameters marks of a subject.

CREATE AGGREGATE FUNCTION aggregate_total_sum(x DOUBLE) RETURNS INT  
BEGIN  
      DECLARE  total_marks INT DEFAULT 0;  
      DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN total_marks;       
      LOOP  
          FETCH GROUP NEXT ROW;  
          SET total_marks = total_marks+x;  
      END LOOP;  
END|

The select query would be: SELECT aggregate_total_sum(Marks_Subject_A) from MARKS;

Find the average marks of the class(including both subjects)
We can perform this query in various cases. We can create another aggregate function that would calculate the average of the marks in both the subject or we could use the aggregate functions defined above, calling them from normal functions.

CREATE FUNCTION average() RETURNS DOUBLE  
BEGIN   
       DECLARE  total_marks INT DEFAULT 0;  
       DECLARE  count_students INT DEFAULT 0;  
       SET count_students= (SELECT aggregate_count(roll_number) from MARKS);  
       SET total_marks= (SELECT aggregate_total_sum(marks_A) from MARKS)+  
                           (SELECT aggregate_total_sum(marks_B)from MARKS);  
      IF count_students = 0 then  
             RETURN 0;  
      ELSE  
            RETURN total_marks/count_students;  
      END IF;  
END|

The select query would be: SELECT average();

Find the final score of each student, given that the weightage for subject A is 40% in the final score and that of subject B is 60%.

CREATE AGGREGATE FUNCTION final_score(x DOUBLE, Y DOUBLE) RETURNS DOUBLE
BEGIN
      DECLARE  score_total INT DEFAULT 0;
      DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN score_total;     
      LOOP
       FETCH GROUP NEXT ROW;
       SET score_total= score_total + x*0.4 + y*0.6;
      END LOOP;
END|

The select query would be:  
    SELECT final_score(Marks_Subject_A, Marks_Subject_B) from MARKS GROUP BY Roll_no;

So these examples help us understand how the aggregate function work. The main points for that we need to keep in mind while writing aggregate functions and queries calling these functions are:
1. FETCH GROUP NEXT ROW instruction fetches the next row and then the values are set to the parameters of the function.
2. When all the rows are fetched then we exit from the loop body and then in the handler declaration, the value of the function is returned.
3. The GROUP BY , ORDER and SORT all clauses work with the aggregate functions , so they can be included while writing the SQL query depending on the needs.
4. The values to the parameters get set after the FETCH GROUP NEXT ROW instruction, so we should not use parameter values to initialize other values as these maybe garbage values before the FETCH GROUP NEXT ROW instruction.

Milestones

This part mainly comprises of the technical details involved in this project.

Added the syntax for the Aggregate Functions to the parser

Added the AGGREGATE symbol to the parser. A test has been written to check that the parser parses the AGGREGATE keyword correctly.

Link to the commit
Added the FETCH GROUP NEXT ROW instruction: This instruction is exclusive for aggregate functions.This instruction is added to make a temporary exit from the function execution. After the temporary exit is made the values are set to the parameters of the function and then the function resumes execution.

Link to the commit
Added an agg_type field to the st_sp_chistics. This has been added to help us store in the mysql.proc table whether a function is aggregate, window or normal function.

Link to the type aggregate_type.

Link to the field added to the st_sp_chistics.
Alter and Create Aggregate functions: The CREATE FUNCTION query should store if the function is an aggregate/window/normal function in the aggregate field of the mysql.proc table. Alter query should allow user to change the aggregate field of the mysql.proc table.

create function, alter function

test for the create and alter queries, result

Added the implementation of the Item_sum_sp class

This class is inherited from the Item_sum class.This class mainly deals with the concept of making an interface which allows user to execute aggregate functions.For executing the aggregate functions we have taken a special cursor approach.

Cursor Approach: A new instruction FETCH GROUP NEXT ROW instruction is introduced. This instruction handles the state the aggregate function is currently in. There is a field pause_state in the class sp_context.
- pause_state is TRUE: In this case, the function enters the pause state and we save the state of the function. The instruction pointer(instr_ptr) continues to point to the FETCH GROUP NEXT ROW instruction, and the function makes a temporary pause in its execution and that the state of the function is saved. The function is paused so that next row values can be set for the function's arguments. After the values are passed to the arguments the function resumes execution.
- pause_state is FALSE: In this case, the function exits from the pause state and resumes execution. The instruction pointer(instr_ptr) is then incremented and now it points to the next instruction to be executed.
Item_sum_sp::add() function added: This function handles everything that is done regarding the aggregation of values.
- Item_sum_sp::execute_impl(): Execute function and store the return value in the field. This is called from the add() function to aggregate the values and from execute() to return the value in the sp_result_field(field defined in the Item_sum_sp class).
- sp_head::execute_aggregate_function(): This function is called from the execute_impl function. In this function the run-time context(func_ctx) is created. I have followed a different approach when the function fetches the row so as to set the values to the parameters. When we call the function for the first time(that is when the first row is fetched), the run-time context is created but the values to the arguments are not set. These values are set when we encounter a FETCH GROUP NEXT ROW instruction and pause the aggregate function. Then after the pause, the values to the parameters are set and the function resumes execution. For the next fetch we don't need to create the context as we preserve the earlier run-time context(func_ctx). The values to the parameter are set straight away and then the rest of the instructions are executed.
Item_sum_sp::val_xxx() functions added: This function gets the return value for the current aggregate function and stores the value in the result field. A signal is send in the field server_status. The value for server status is set to SERVER_STATUS_LAST_ROW_SENT. This implies that all the rows have been fetched and now we need to throw an error that there is no more data (ER_SP_FETCH_NO_DATA), so this error is handled by the handler. The handler handles the error and the function execution is complete.

Aggregate Functions

MariaDB server is a community developed fork of MySQL server. Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry.