Microsoft .NET SDK is violating the GDPR, object now

Skip to the end if you just want a template for a "Right to object" letter.

The .NET Core software development kit (SDK) is a set of command line tools that allow development against Microsoft's .NET. When the command line tool is run it sends some telemetry back to Microsoft.

This telemetry is documented, however the way it is implemented appears against the GDPR in multiple ways.

Collection by default

Running dotnet help on a clean install, will print a message about telemetry, but will send the details on that run.

Therefore the GDPR requriements under "Burden of proof and requirements for consent" cannot be met -- even if we give Microsoft the benefit of doubt and consider a command line application somewhat special, it does not give us a chance to opt-out, yet alone opt-in.

During the installation process, there is a link to Microsoft's Privacy statement, however this does not mention the environment variable needed to opt-out, so there is no way to opt out before the first piece of data is sent.

Collecting personal data

The linked information page, as well as the text printed in the CLI both say "The data is anonymous.", it clearly isn't. They collect MAC addresses and the current working directory, which are sent to their servers hashed.

A key thing to understand: Hashing without a salt does not make the data anonymous or even pseudonymous.

See for example GDPR pseudonymisation techniques. Given this, it is a clear violation as Article 6 requires specific purposes for processing, but the data was claimed to be anonymous, which it isn't.

A common place to run "dotnet help" might be a home directory, which can often include a username, e.g. /home/dgl. So while it's hashed it is easily possible to find the hash that relates to a particular user:

$ pwd
/home/dgl
$ echo -n $PWD | sha256sum
67acfe0ddd44867e1e5da5ddaf25a5b90e928f523cecf614e201c683b7533cf6  -

This matches the relevant part of the JSON sent:

  "Current Path Hash": "67acfe0ddd44867e1e5da5ddaf25a5b90e928f523cecf614e201c683b7533cf6",

There are some discussions about the collection of MAC addresses in issue #6145 but no particular reply from Microsoft; people suspect it's a GDPR violation. Note that under the GDPR there are time limits for replying, so that's another potential GDPR issue.

The report I sent to Microsoft detailed how a MAC address is likely not even 48-bits of search space, as we know what ones are assigned, so brute-forcing is quite possible, particularly when combined with the fact the search space can be reduced by filtering on the path hash.

One principle of the GDPR is you need to explain why you're collecting information, Microsoft do in a round about way, rather than on the link printed by the command (https://aka.ms/dotnet-cli-telemetry) which explains what they are collecting, the "why" is hidden on a blog post.

It says:
Hashed MAC address — Determine a cryptographically (SHA256) anonymous and unique ID for a machine. Useful to determine the aggregate number of machines that use .NET Core. This data will not be shared in the public data releases.

Hashed current working directory — Determine build machines from dev machines using the heuristic of a large number of working directories. This distinction helps explain large #s of builds from a machine.

This shows a clear intention to join the data, for some purposes. The "Hashed MAC address" is obviously understood to be somewhat sensitive as it mentions they won't share it. Interestingly the same is not said for the current working directory, which is also sensitive.

Privacy policy

As mentioned Microsoft has a general privacy policy.

This could allow some collection, however:

You have choices when it comes to the technology you use and the data you share. When we ask you to provide personal data, you can decline.

This behaviour would be GDPR compliant, but the key point is due to the first collection behaviour of e.g. running dotnet help there is no chance to decline. So the behaviour of the dotnet tool is inconsistent with their own privacy policy.

How to respond?

I sent a report to Microsoft's security team, because arguably incorrect use of a cryptographic hashing function (SHA256 of the items, without a salt) is a "Security Design Flaw" which qualifies under the dotnet bug bounty. This was obviously fishing a bit, and Microsoft denied me. More surpsingly they don't seem to consider this a problem at all, their final reply was:

We have updates scheduled for the first run experience and related documentation to make it more accurate. Personal data is handled consistently with GDPR requirements.

You'll notice that they don't talk about any actual collection changes. Also interestingly if the data is anonymous as they claim, what "Personal data" are they referring to in this reply?

What's the route the GDPR gives us here?

This is an interesting one, partly Article 11 Processing which does not require identification could apply, in that they can exclude themselves from right of access, etc. With the exception of "Right to object".

There are two routes, either we give enough information for Microsoft to be happy we identify ourselves (and use the right to erasure), or we use the right to object, which while it doesn't require Microsoft to delete the data entirely does require them to limit their use of it.

So I can object to any processing of my data, and as I've proved we can use the MAC address to find my machine in their data. I believe given this has gone on for several years that even if they make the data collection GDPR compliant, there is a huge historical collection of data that may need clearing as it has been collected unlawfully.

In my original report to Microsoft I made some recommendations:


Recommendations:
  • Remove the telemetry, it avoids any potential GDPR issues;
  • Delete the historical data (I am not a lawyer, but I suspect this has GDPR implications);

It appears they do not intend to follow these so I suggest that any user of .NET Core SDK in the European Union takes matters into their own hands and use their right to object:

[Your full address] [The date]

To Data Controller, Microsoft Corporation

I am exercising my right to object under the General Data Protection Regulation (Article 21).

It has come to my attention that Microsoft .NET Core SDK collects telemetry, and the opt-out process for this data is flawed.

In particular setting the DOTNET_CLI_TELEMETRY_OPTOUT variable is only suggested the first time the tool is run, so some data may have been collected against my will.

In light of Microsoft not deleting this telemetry data for everyone (per the report of David Leadbeater on 3 August 2020). I wish that my telemetry data is restricted from further processing, as I have set the opt-out environment variable, but I cannot be sure that some data has not reached Microsoft already.

The MAC addresses of my machine(s) is/are:

XX:XX:XX:XX:XX:XX

I believe this is enough to identify my records, as they are stored as a SHA256 hash of this MAC address.

Please send a full response within one calendar month confirming if you will comply with my request. If you cannot respond within that timescale, please tell me when you will be able to respond.

If there is anything you would like to discuss, please contact me.

Thank you,

Obviously the somewhat strange thing about this is you reveal your MAC address to Microsoft in the process, but I'm fairly sure the data protection around GDPR requests is something that is well scrutinized.

If you do want to send this to Microsoft go to https://www.microsoft.com/en-GB/concern/privacy and select "I want to contact Microsoft’s Data Protection Officer".

It's strange this is still an issue as this was previously discussed over two years ago (see this Hacker News thread). Let's use our right to object!

Update: April 2021: Microsoft have now released .NET SDK 5 and carefully removed the word anonymous from the description of the telemetry, small victory.