Wednesday, July 29, 2015

MySQL- Hierarchical Data

No developer ever came into the problem of storing hierarchical data. And we always come up with a solution that store the parentId into the same table. And when we just need to retrieve the data then it becomes a nightmare to optimize the nested-join. The advantage to these systems is that they don't take a lot of space, but the disadvantage is that nearly all the maintenance has to be done in the application. So, in essence, these approaches trade lots of queries at retrieval time, for lots of queries at update and insert/delete time, and often some additional programming complexity.

Just imagine a tree and what outcome a user may want:
root
- son
- son#2
- grandson#1
- super grandson
- son#3
-grandson#2

and I want to display it like:

root -> son
root -> son#2
root -> son#2 -> grandson#1
root -> son#2 -> grandson#1 -> super grandson
root -> son#3
root -> son#3 -> grandson#2

In order to get the following output we apply the approach of closure table. A closure table gives you the ability to do all the same sorts of "find me all children of X, to depth N" queries as any of the other methods, just by doing a join against the closure table. And, the killer feature of the closure table, is that to add or remove a parent-child relationship, you only need to run *one* SQL query -- a query so simple that even the least-powerful SQL databases can be configured to run it as a trigger on the main table!

The Closure Table is a design for representing trees in a relational database by storing all the paths between tree nodes.


create table family (
relation_id int auto_increment primary key,
relation varchar(20) not null
);

insert into family (relation_id, relation) values
(1, 'root'),
(2, 'son'),
(3, 'son#2'),
(4, 'grandson#1'),
(5, 'super grandson'),
(6, 'son#3'),
(7, 'grandson#2');

create table family_tree (
ancestor int not null,
descendant int not null,
primary key (ancestor, descendant),
foreign key (ancestor) references family(relation_id),
foreign key (descendant) references family(relation_id)
);

insert into family_tree (ancestor, descendant) values
(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (1,7),
(2,2),
(3,3), (3,4), (3,5),
(4,4), (4,5),
(5,5),
(6,6), (6,7),
(7,7);


What we need to do is find all the descendants of the root node 1, then for each of these descendant, list its ancestors in order, separated by an arrow. We can use MySQL's useful GROUP_CONCAT() function to build this list for us.

select group_concat(f.relation order by f.relation_id separator ' -> ') as path
from family_tree d
join family_tree a on (a.descendant = d.descendant)
join family f on (f.relation_id = a.ancestor)
where d.ancestor = 1 and d.descendant != d.ancestor
group by d.descendant;

Here's the output in the MySQL client. It looks like what the reader asked for:

+----------------------------------------------------------+
| path                                                                       |
+----------------------------------------------------------+
|root -> son                                                              |
|root -> son#2                                                          |
|root -> son#2 -> grandson#1                                  |
|root -> son#2 -> grandson#1 -> super grandson    |
|root -> son#3                                                          |
|root -> son#3 -> grandson#2                                  |
+----------------------------------------------------------+

I do assume for the purposes of ordering that all of a node's ancestors have a lower node number. You could alternatively use a pathlength column to the family_tree table and sort by that.

The Closure Table design is nice compared to the Nested Sets (or Preorder Traversal) design, because it supports the use of referential integrity. By using indexes, the EXPLAIN report shows that MySQL query optimizer does a pretty good job on it:
MySQL Query Stats


No comments:

Post a Comment