Introduction
Most of the current databases like Oracle, PostgreSQL, HSQL support the notion of CREATING AGGREGATE FUNCTIONS . This notion is missing from MariaDB, so this project deals with creating the support for aggregate functions for MariaDB. To incorporate the support for the aggregate stored functions first we had to form the syntax for these functions and then a design plan on how we are going to implement these functions.
SYNTAX OF AGGREGATE FUNCTIONS
Well now lets see some examples to understand how these functions work.
CREATE AGGREGATE FUNCTION function_name (parameters) RETURNS return_type
BEGIN
All types of declarations
DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN return_val;
LOOP
FETCH GROUP NEXT ROW;
Some instructions
END LOOP;
END|
This would be the syntax and structure for the aggregate functions.
FETCH GROUP NEXT ROW is the essential instruction for the aggregate. It fetches the next row of the table. It is by this instruction the values to the parameters are set.
EXAMPLES OF AGGREGATE FUNCTIONS
So let us say we take an example where we have a table which contains details about a student and his/her marks in two subjects( Subject A and Subject B).
Our table MARKS would have the fields ( Student, Roll Number, Marks in Subject A, Marks in Subject B)
The syntax would be:
CREATE TABLE MARKS
(
Roll_No INT NOT NULL,
Name varchar(255),
Marks_Subject_A DOUBLE,
Marks_Subject_B DOUBLE,
PRIMARY KEY (Roll_No)
)
So we need to find the answer to few queries
Find the number of students in the class.
For this query , we can create an aggregate function to count the number of entries in the table.
Let us name the function , aggregate_count and it accepts one parameter that is Roll Number.
CREATE AGGREGATE FUNCTION aggregate_count(x INT) RETURNS INT
BEGIN
DECLARE count_students INT DEFAULT 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN count_students;
LOOP
FETCH GROUP NEXT ROW;
IF x THEN
SET count_students = count_students+1;
END IF;
END LOOP;
END|
The select query would be: SELECT aggregate_count(Roll_No) from MARKS;
Find the total marks scored by all students in Subject A
For this query we can create an aggregate function to calculate the total sum of marks of all the student for the Subject A.
Let us name the function total_sum, with parameters marks of a subject.
CREATE AGGREGATE FUNCTION aggregate_total_sum(x DOUBLE) RETURNS INT
BEGIN
DECLARE total_marks INT DEFAULT 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN total_marks;
LOOP
FETCH GROUP NEXT ROW;
SET total_marks = total_marks+x;
END LOOP;
END|
The select query would be: SELECT aggregate_total_sum(Marks_Subject_A) from MARKS;
Find the average marks of the class(including both subjects)
We can perform this query in various cases. We can create another aggregate function that would calculate the average of the marks in both the subject or we could use the aggregate functions defined above, calling them from normal functions.
CREATE FUNCTION average() RETURNS DOUBLE
BEGIN
DECLARE total_marks INT DEFAULT 0;
DECLARE count_students INT DEFAULT 0;
SET count_students= (SELECT aggregate_count(roll_number) from MARKS);
SET total_marks= (SELECT aggregate_total_sum(marks_A) from MARKS)+
(SELECT aggregate_total_sum(marks_B)from MARKS);
IF count_students = 0 then
RETURN 0;
ELSE
RETURN total_marks/count_students;
END IF;
END|
The select query would be: SELECT average();
Find the final score of each student, given that the weightage for subject A is 40% in the final score and that of subject B is 60%.
CREATE AGGREGATE FUNCTION final_score(x DOUBLE, Y DOUBLE) RETURNS DOUBLE
BEGIN
DECLARE score_total INT DEFAULT 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND RETURN score_total;
LOOP
FETCH GROUP NEXT ROW;
SET score_total= score_total + x*0.4 + y*0.6;
END LOOP;
END|
The select query would be:
SELECT final_score(Marks_Subject_A, Marks_Subject_B) from MARKS GROUP BY Roll_no;
So these examples help us understand how the aggregate function work. The main points for that we need to keep in mind while writing aggregate functions and queries calling these functions are:
1. FETCH GROUP NEXT ROW instruction fetches the next row and then the values are set to the parameters of the function.
2. When all the rows are fetched then we exit from the loop body and then in the handler declaration, the value of the function is returned.
3. The GROUP BY , ORDER and SORT all clauses work with the aggregate functions , so they can be included while writing the SQL query depending on the needs.
4. The values to the parameters get set after the FETCH GROUP NEXT ROW instruction, so we should not use parameter values to initialize other values as these maybe garbage values before the FETCH GROUP NEXT ROW instruction.
Milestones
This part mainly comprises of the technical details involved in this project.
Added the syntax for the Aggregate Functions to the parser
-
Added the AGGREGATE symbol to the parser. A test has been written to check that the parser parses the AGGREGATE keyword correctly.
Link to the commit
-
Added the FETCH GROUP NEXT ROW instruction: This instruction is exclusive for aggregate functions.This instruction is added to make a temporary exit from the function execution. After the temporary exit is made the values are set to the parameters of the function and then the function resumes execution.
Link to the commit
-
Added an agg_type field to the st_sp_chistics. This has been added to help us store in the mysql.proc table whether a function is aggregate, window or normal function.
Link to the type aggregate_type.
Link to the field added to the st_sp_chistics.
-
Alter and Create Aggregate functions: The CREATE FUNCTION query should store if the function is an aggregate/window/normal function in the aggregate field of the mysql.proc table. Alter query should allow user to change the aggregate field of the mysql.proc table.
Added the implementation of the Item_sum_sp class
This class is inherited from the Item_sum class.This class mainly deals with the concept of making an interface which allows user to execute aggregate functions.For executing the aggregate functions we have taken a special cursor approach.
-
Cursor Approach: A new instruction FETCH GROUP NEXT ROW instruction is introduced. This instruction handles the state the aggregate function is currently in. There is a field pause_state in the class sp_context.
pause_state is TRUE: In this case, the function enters the pause state and we save the state of the function. The instruction pointer(instr_ptr) continues to point to the FETCH GROUP NEXT ROW instruction, and the function makes a temporary pause in its execution and that the state of the function is saved. The function is paused so that next row values can be set for the function's arguments. After the values are passed to the arguments the function resumes execution.
pause_state is FALSE: In this case, the function exits from the pause state and resumes execution. The instruction pointer(instr_ptr) is then incremented and now it points to the next instruction to be executed.
-
Item_sum_sp::add() function added: This function handles everything that is done regarding the aggregation of values.
Item_sum_sp::execute_impl(): Execute function and store the return value in the field. This is called from the add() function to aggregate the values and from execute() to return the value in the sp_result_field(field defined in the Item_sum_sp class).
sp_head::execute_aggregate_function(): This function is called from the execute_impl function. In this function the run-time context(func_ctx) is created. I have followed a different approach when the function fetches the row so as to set the values to the parameters. When we call the function for the first time(that is when the first row is fetched), the run-time context is created but the values to the arguments are not set. These values are set when we encounter a FETCH GROUP NEXT ROW instruction and pause the aggregate function. Then after the pause, the values to the parameters are set and the function resumes execution. For the next fetch we don't need to create the context as we preserve the earlier run-time context(func_ctx). The values to the parameter are set straight away and then the rest of the instructions are executed.
Item_sum_sp::val_xxx() functions added: This function gets the return value for the current aggregate function and stores the value in the result field. A signal is send in the field server_status. The value for server status is set to SERVER_STATUS_LAST_ROW_SENT. This implies that all the rows have been fetched and now we need to throw an error that there is no more data (ER_SP_FETCH_NO_DATA), so this error is handled by the handler. The handler handles the error and the function execution is complete.