Author Image

By: Aivis Olsteins

In News

2018-06-11

GDPR and relational data in the database

GDPR now requires the entity which possesses personal data on its users to comply with right of erasure, i.e. to delete personal data associated with them completely from the systems.

In the fully normalized database the tables representing some kind of transactions (e.g. call logs, payments, log in sessions etc) usually have a foreign key linking to the other table containing users data. So, for example let's have a simplified setup where we care about customers and call they made, we have following two tables: Customers and Calls. The Calls table has a foreign key customer_id referencing to the Customers table.

Here is the simplified schema (written for MySQL version 5.7):

CREATE TABLE `Customers` (
`customer_id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(40) DEFAULT '',
`last_name` varchar(40) DEFAULT '',
PRIMARY KEY (`customer_id`)
);

CREATE TABLE `Calls` (
`call_id` int(11) NOT NULL AUTO_INCREMENT,
`customer_id` int(11) DEFAULT NULL,
`call_time` datetime DEFAULT NULL,
`called_number` varchar(40) NOT NULL DEFAULT '',
PRIMARY KEY (`call_id`),
KEY `customer_id` (`customer_id`),
CONSTRAINT `Calls_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `Customers` (`customer_id`)
);

 

Let's populate them with some data:

mysql> insert into Customers (customer_id, first_name, last_name) values (1,'John','Doe'),(2,'Mickey','Mouse');
mysql> insert into Calls (call_id, customer_id, call_time, called_number) values (1,1,now(), '1001'),(2,1,now(),'1002'),(3,2,now(),'1003');

 

Here are the contents:

 

mysql> select * from Customers;

+-------------+------------+-----------+
| customer_id | first_name | last_name |
+-------------+------------+-----------+
| 1 | John | Doe |
| 2 | Mickey | Mouse |
+-------------+------------+-----------+

mysql> select * from Calls;

+---------+-------------+---------------------+---------------+
| call_id | customer_id | call_time | called_number |
+---------+-------------+---------------------+---------------+
| 1 | 1 | 2018-06-07 12:01:05 | 1001 |
| 2 | 1 | 2018-06-07 12:01:05 | 1002 |
| 3 | 2 | 2018-06-07 12:01:05 | 1003 |
+---------+-------------+---------------------+---------------+

 

Note, that the above schema is not fully valid because it does not specify the action what happens to the child record in Calls table if the parent record is deleted from Customers table. In case of MySQL the default action is 'RESTRICT' so it is the same as we have written: CONSTRAINT `Calls_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `Customers` (`customer_id`) ON DELETE RESTRICT

That means that in the example case we are actually unable to delete Customer entry if they have made a call:

mysql> delete from Customers where customer_id=1;

ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`test`.`Calls`, CONSTRAINT `Calls_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `Customers` (`customer_id`))

Let's try to modify the key referential action to ON DELETE CASCADE:

mysql> alter table Calls add constraint Calls_ibfk_1 foreign key (`customer_id`) REFERENCES `Customers` (`customer_id`) on delete cascade;

Now we can delete to Customer:

mysql> delete from Customers where customer_id=1;

Query OK, 1 row affected (0.00 sec)

 

But, the bad news are that also all calls belonging to the customer are gone (since we cascaded the tables):


mysql> select * from Calls;

+---------+-------------+---------------------+---------------+
| call_id | customer_id | call_time | called_number |
+---------+-------------+---------------------+---------------+
| 3 | 2 | 2018-06-07 12:01:05 | 1003 |
+---------+-------------+---------------------+---------------+

 

Obviously cascading the tables are bad idea, and should never be used to avoid unintentional loss of transactions (calls). Let's change referential action to 'SET NULL':

 

mysql> alter table Calls drop foreign key Calls_ibfk_1;

mysql> alter table Calls add constraint Calls_ibfk_1 foreign key (`customer_id`) REFERENCES `Customers` (`customer_id`) on delete set null;

 

Now, when deleting customer with ID 1, we see that they identifier is set to NULL:

 

mysql> delete from Customers where customer_id=1;

Query OK, 1 row affected (0.00 sec)


mysql> select * from Calls;

+---------+-------------+---------------------+---------------+
| call_id | customer_id | call_time | called_number |
+---------+-------------+---------------------+---------------+
| 1 | NULL | 2018-06-07 12:18:21 | 1001 |
| 2 | NULL | 2018-06-07 12:18:21 | 1002 |
| 3 | 2 | 2018-06-07 12:18:21 | 1003 |
+---------+-------------+---------------------+---------------+

This solution works in the cases you don't care to know who has made the call unless the call info is present. The situation becomes more complicated if you delete more customers: the keys in call table with be set to NULL for all of them, and you will not know even if the calls were made by one or several customers.

 

And last, more complex, but GDPR compliant solution which saves from the problems above: change referential action back to RESTRICT:

 

mysql> alter table Calls drop foreign key Calls_ibfk_1;

mysql> alter table Calls add constraint Calls_ibfk_1 foreign key (`customer_id`) REFERENCES `Customers` (`customer_id`) on delete restrict;

 

And instead of delete from Customers table, use update and clear out all Personally Identifiable Information (PII) fields: see what is considered PII here: https://en.wikipedia.org/wiki/Personally_identifiable_information

mysql> update Customers set first_name='', last_name='' where customer_id=1;

Therefore we have successfully removed all PII from the system without affecting related tables:

 

mysql> select * from Customers;

+-------------+------------+-----------+
| customer_id | first_name | last_name |
+-------------+------------+-----------+
| 1 | | |
| 2 | Mickey | Mouse |
+-------------+------------+-----------+


mysql> select * from Calls;

+---------+-------------+---------------------+---------------+
| call_id | customer_id | call_time | called_number |
+---------+-------------+---------------------+---------------+
| 1 | 1 | 2018-06-07 12:22:04 | 1001 |
| 2 | 1 | 2018-06-07 12:22:04 | 1002 |
| 3 | 2 | 2018-06-07 12:22:04 | 1003 |
+---------+-------------+---------------------+---------------+

 

Leave a comment
About Author
Author Image
My name is Aivis Olsteins and I am owner of DataTechLabs. My experience in Telecoms started in early 1990's and I have worked in multiple technical positions in mobile, messaging and data networks. My expertise lies in telecom networks, database systems, distributed processing and large data analysis. These posts are my attempt to share my knowledge with everyone who might find it useful.

Get in Touch

If you have something to say, please use the contact form below to get in touch with us. We will get back to you as soon as possible.

Mail Us

DataTechLabs SIA, Muzikas str 12A
Jurmala, LV-2008, Latvia.

Call Us

+371 67 66 09 01
+ 1 202 499 1550

E-mail Us

info@datatechlabs.com
support@datatechlabs.com