Large scale metadata collection and privacy implications.
Intro
I was using Zcash and it struck me that for all its sophistication, you could break the privacy guarantees by exploiting human behaviour.
Metadata
data that provides information about other data
Collection
especially: an accumulation of objects gathered for study, comparison, or exhibition or as a hobby
Examples of modern-day metadata
-
Who you talk to via WhatsApp but not the content of the messages.
-
The volume of internet traffic where you live but not the content.
-
The websites you visit but not the requests you send to that website assuming a TLS connection is in use.
-
The subject field of any e-mail landing in your G-Mail account not sent from another G-Mail account.
Now some of the above may not seem like metadata but the term ends up being very stretched in the context of collection programs and in the legal sense.
Details included in a Z.Cash transaction.
ZCash uses zero knowledge proofs to ensure confidentiality of transactions when using shielded addresses.
A shielded to shielded transaction the only thing we can see is the transaction ID and the amount sent.
Z.Cash wallet U.I design implications.
I don’t know about you but I have a habit of clicking on buttons that have hyperlinks.
The third party website uses SSL so we are definitely good from a metadata perspective, right?
The SNI and eSNI field in an TLS connection.
The Server Name Indicator (SNI) field in a TLS connection is plaintext. It tells a HTTP server that hosts many websites which virtual host we are requesting data for.
For example. Say I am connecting to a content delivery network (CDN) or load balancer (LB).
I open a TCP connection to 10.1.1.1 on port 443.
The server at 10.1.1.1 hosts 1000 websites all with their own certificate.
How do I tell the server I want to connect to reddit.com and not twitter.com when they both have separate certificates but are hosted on the same server?
I tell it.
You can view a whole TLS 1.3 connection to get a better understanding but the specific part that is important to us is the Client Hello step.
The Client Hello is not encrypted a TLS security extension called encrypted server name indication field.
eSNI is not widely supported and from the client perspective requires opt-in via non-user-friendly ways.
Even with eSNI if there is a passive network observer, they can see your DNS.
DNSSEC only provides authenticity not confidentiality.
A hypothetical Z.Cash transaction source exposure.
For this hypothetical assume we have the capabilities of a well-funded nation state.
So referring back to the wallet showing the option to view the shielded transaction we sent via a third party website using TLS.
Assume we are not doing any cryptoanalysis or have broken any encryption.
-
Collect all Z.Cash blockchain transactions.
-
Collect all ISP traffic for known Z.Cash blockchain nodes and wallet sync services.
-
Collect all ISP traffic for target country filtered by SNI.
We can now correlate people who sent shielded to an IP address.
Anyone who did this without using an anonymity providing service is exposed.
You could then use the same metadata collection for VPN and Tor connections.
Enrich our current cryptocurrency dataset with the details from that.
This does not prove that you sent the transaction but it could be enough to apply for a warrant to increase to targeted surveillance.
Outro
Large scale traffic collection paired with metadata analysis can reduce the effectiveness of cryptography.
It can completely break the premises assured to you by anonymity services.
I am sure there is plenty I am not thinking about or mistakes I have made so make sure to point them out if you see them.
I do read comments or feedback and apply it.