A Rough Guide to Running a GDPR Compliant SaaS Business (ssr)

Written by Erik Grinaker

Missing Image!

The European Union's upcoming law on personal data processing, the General Data Protection Regulation (GDPR), goes into effect on May 25th 2018. This will have a significant impact on how businesses, and SaaS businesses in particular, handle their customers' personal data - and not only in Europe, since it applies to any business offering goods or services to, or performing monitoring of, users in the EU.

For SaaS providers like us the GDPR is important. Not only do we need to be compliant. We need to help our customers be compliant as well. Choosing Sanity.io needs to remove headaches with regards to implementing GDPR, not add to them.

At Sanity.io, we're big fans of the GDPR. Personal data has historically been used and shared indiscriminately, and stored indefinitely "just in case". The GDPR encourages businesses to be more aware of the data they collect and what they do with it, and gives individuals much more control over what happens to their data. We're currently working hard towards compliance, and are happy to see that most other SaaS providers are doing the same.

However, while the GDPR is a remarkably clear and understandable piece of legislation, we quickly ran into grey areas and uncertainties during implementation. These are issues that we expect most other SaaS providers will face as well, so we wanted to share our thoughts on a few practical aspects. Of course, this should not be taken as legal advice at all, and we strongly suggest you discuss this with a lawyer if necessary.

The Gist

Personal data is any piece of data which can reasonably be traced back to a specific individual, including the obvious such as name, address, photo, phone number, and email address, but also the less obvious such as IP address, browser user agent, user ID, and so on. Sensitive data related to health, ethnicity, sexual orientation, political and religious affiliation, and criminal convictions is regulated much more strongly, and you really shouldn't be collecting this without a very good reason. If, however, the data is anonymous (can not be traced back to an individual in any way), you can do pretty much whatever you want with it.

In general, the GDPR wants you to:

Collect as little as necessary, keep it for as short as necessary, and share it with as few as necessary.
Be transparent about what it will be used for, explain it in clear and simple terms, and only use it for the stated purpose.
Allow the individual to choose what to share, to retrieve all of their data, to correct it, and to permanently delete it.
Keep it secure, and notify the individual and the authorities as soon as possible in the case of a security breach or data leak.

This is all very reasonable, and should be uncontroversial for any business that respects their users. For anyone else, the EU has provided an added incentive as well: fail to do this, and you may be fined up to 20 million Euros or 4% of your global turnover, whichever is larger.

Since personal data processing is a core activity for many SaaS businesses, you need to appoint a Data Protection Officer (DPO) tasked with making sure all personal data is handled properly, and register the DPO with the local data protection authorities. You must also maintain an internal record of processing which describes all of the personal data processing that you do.

In order to collect and process personal data, you need a lawful basis for doing so. The complete list can be found in article 6 paragraph 1, but in most cases this will either be through the explicit consent of the person, to fulfill a contract with the person, due to a legal obligation (e.g. accounting laws), or because you have a legitimate interest for which the data is necessary. You will likely use several of these depending on the context, and we'll give some examples below.

The GDPR distinguishes between data controllers and data processors. A controller is the legal entity who is responsible for the data. A processor is a third party who processes the data on behalf of the controller. As a SaaS, you will most likely be both: you are a controller for data which you collect yourself (e.g. your user database and newsletter subscriptions), and a processor for data which your customers store in your SaaS product. If you employ third-party processors (e.g. a cloud provider or email service), you must make sure they process the data in a manner that is compatible with your terms and the GDPR - likewise, as a processor you should provide your customers (the controllers) with terms and tools which allow them to be GDPR compliant.

That's pretty much it. While this may seem pretty straightforward on its face, it can get a bit hairy once you start dealing with specifics. Let's have a look.

Your Terms of Service and Consent Forms

A good place to start is by updating your terms of service. Most of the requirements are listed in articles 12, 13, and 14. Basically, they must be written in plain and easily understandable language, inform the user of their rights, and clearly lay out what data you collect, why you collect it, what you do with it, and how long you keep it for. They must also state whether the data is shared with any third parties (which do not need to be named, but you get bonus points for doing so), and note any data transfers out of the EU.

If you collect any data on the basis of explicit consent you must ask for consent separate from your terms of service. It is no longer enough to bury this on page 9 of your terms, or say that the user implicitly consents by using the service: the user must take an active action (such as ticking a checkbox) where it is clearly explained exactly what they consent to, and you must keep a record of it (e.g. an audit log). Furthermore, the user must not be denied service even if they do not consent, and they must be able to revoke consent as easily as they gave it (e.g. on their settings page). They still have to accept your general terms of use to use your service, but you cannot e.g. demand that they also allow you to track their every move on your website or share their data with advertisers.

Your Website

The first issue most people encounter with their website is probably Google Analytics, which is used to collect visitor statistics. This is actually pretty simple: Google forbids you to submit any personal data to Google Analytics at all, and will delete your account if you do so. As long as you only use a random client ID (CID) cookie, this is considered anonymous data - it's only possible to say that this is a person, not which person - so you can use this data as you wish. However, if you connect this with a user ID from your service, which is still anonymous to Google, the data is no longer anonymous to you, and you must ask for consent. The tricky bit here is that if you use client-side JavaScript to submit the data, the user's IP address is sent to Google as part of the network request, and while Google claims it does not store this it can still be considered a transfer of personal data. We therefore suggest you collect the data on the server side, set the last octet of the IP address to 0 to anonymize it somewhat but still have rough location data, and submit it yourself.

A less obvious issue is the inclusion of external assets, loaded from third parties who are not acting as a processor on your behalf. If you e.g. load fonts from Google Fonts, the user's IP address is transferred to Google when the font is loaded, and this could be used to track the user's movements across sites. The same goes for e.g. adverts, Facebook buttons, Twitter feeds, and other assets loaded on the client side. We suggest you serve these assets from a site you control, either with a local copy or a proxy - if not, you may need to get the user's consent, and allow them to use your site even if they reject this.

Your Logs and Error Reporting

Another issue which quickly comes up is web server access logs. These are crucial in analyzing your site traffic, debugging problems, and defending against security attacks. The problem is that the user's personal data (IP address, user agent, HTTP referer etc) is collected and stored as soon as they access your site - before you are even able to ask for consent. In this case, we instead collect the data on the basis of a legitimate interest, since this data is necessary to operate the service. However, this must be properly secured, and the data deleted or anonymized as soon as it is no longer needed - maybe 1-3 months.

The same goes for your application logs. These logs will often contain personal information (either incidentally or routinely), and must be deleted or anonymized as soon as they are no longer necessary - perhaps a few weeks. This requires a proper log aggregation and rotation infrastructure, and can often be fiddly to get right. For example, we ran into issues with Google Kubernetes Engine and Docker which uses volume-based log rotation (deleted after 50 MB) rather than time-based rotation for the node-local container logs, so small-volume applications can have their logs left around for months. We have opened an issue with Docker for time-based rotation, but so far have not received a reply - in the worst case, we will have to shut down and purge any containers that haven't rotated their logs within an acceptable time frame.

Error reporting services, such as sentry.io, are also indispensable for alerting and debugging of problems, but will usually collect personal data such as the IP address and user ID of the user. This is gathered on the same basis as your logs, and must similarly be removed in a timely fashion. It's also a good idea to configure this so that any content which may contain personal data or access credentials is not included in notifications, to prevent it from winding up in email inboxes or Slack channels.

Your Users, Payments, and Mailing Lists

When it comes to your user database, this data is collected by legitimate interest - it is necessary to provide the service. The important thing here, apart from the obvious security requirements, is mostly to allow users to update or delete their data, and to actually delete or anonymize it within a reasonable time frame afterward. For our part, we mostly anonymize it by removing any personal identifiers and deleting any other unnecessary data, but otherwise leave the records and user IDs intact - this both allows us to prevent reuse of IDs (which can be an attack vector), and extract historical usage statistics. As for single sign-on, i.e. allowing users to log in using their Google or GitHub accounts, this is regulated by the terms between the user and third-party provider, and as long as you handle the data you receive from these services in a similar fashion as the rest of your user data you shouldn't have to give it much thought.

Payment data, however, must usually be collected and stored by legal obligation - we can't speak for other jurisdictions, but here in Norway, the accounting laws require us to keep this for at least five years. However, you should still remember to delete what information you can when the user cancels their account, such as the credit card on file, and otherwise be particularly careful with this data for obvious reasons.

As for mailing lists, users sign up for these of their own initiative (whether they are registered users or simply website visitors), and can easily unsubscribe on their own. Such news about the service is provided by legitimate interest, and therefore does not require any further consent.

Your Database and Data Stores

The primary concern with your database and other data stores should be security, but this is a large topic which we can't hope to cover here. A close second, however, is proper data deletion, and while it's easy enough to issue a delete command to the database there may be other less-than-obvious aspects to consider. For example, many databases will simply mark a row as deleted or outdated, but not actually remove it from disk until it is overwritten by other data. Many databases also maintain a transaction log containing all changes that have been written to the database, so make sure this log has sufficiently low retention.

Even if the data is completely deleted from the system, however, fragments may still linger in the raw disk blocks or filesystem journal until overwritten, and can in some cases be recovered without too much effort. Unfortunately, this is a far harder problem to solve without destroying the entire disk, and we do not believe a practical solution exists for guaranteed removal of this data from a live system. If anyone has any suggestions, we would love to hear them! It is therefore crucial to properly dispose of retired disks, either by using full-disk encryption and deleting the key or by physically destroying the disk (preferably both).

And don't forget your backups - gone are the days of keeping them around for months on end "just in case", so make sure your retention is limited and in line with your service terms.

Your Customers' Data

Most of the issues we have discussed so far have applied to data for which you are the controller. However, you will almost certainly also be a processor of personal data which your customers store in your SaaS product (assuming they include companies and other organizations, as individuals are exempt from the requirements of the GDPR). In this case it is the customer who is legally responsible for the data (as the controller), and so it is their responsibility to only use processors who have provided "sufficient guarantees" that the processing is secure and in compliance with the GDPR, as detailed in article 28. However, it is clearly in your interest to make this easy for them, so you should consider being open and transparent about your security practices, adhering to a recognized code of conduct, and obtaining a certification.

If you are based in the U.S., your European customers will also need you to register for and adhere to the EU-U.S. Privacy Shield framework and sign a Data Processing Addendum - other non-EU/U.S. businesses must do something similar, but the details vary and we have only been involved with processors in these jurisdictions.

You should also provide your customers with tools that make it easy for them to fulfil their obligations under the GDPR, such as search, modification, export, deletion, and anonymization of their stored data. You may also want to gather the contact details of their DPO, so you can contact them if any issues should arise.

Your Third-Party Processors

Like your customers, if you employ third-party processors (e.g. a cloud provider or email service), you are ultimately responsible for what they do with the data - you are liable if they have a security or compliance issue. You therefore have to make sure they provide sufficient guarantees of being GDPR-compliant and having adequate security practices, and that their terms and settings are compatible with the terms you have with your users (e.g. data retention periods).

If the processor is located outside of the EU, you also need to ensure appropriate safeguards and contractual terms are in place for the transfer - this typically comes up with U.S.-based processors, who must adhere to the EU-U.S. Privacy Shield framework and sign a Data Processing Addendum with you.

Final Words

We can't hope to cover all aspects of running a GDPR-compliant SaaS business here - in particular, we haven't discussed security practices, processes for exporting and deleting data on user request, or handling of your employee data. Regardless, we hope that this rough guide to the GDPR will be informative and helpful in making your own SaaS business compliant, and encourage you to get in touch if you have any feedback or suggestions.

We at Sanity.io look forward to a future where SaaS providers and other companies treat their customers' data with care and respect. Sanity.io is working towards full GDPR compliance by the time the legislation enters into effect, and will provide the infrastructure and tools necessary to ensure compliance for our users as well.