Should companies be able to dox website visitors?
New technology is de-anonymizing traffic without consent.
If you are a large social media network, you require users to create an account in order to use the platform. Everybody who signs-up has an email address on file and many users also create a real profile – they add their name, upload a photo, select a few of their key interests.
You offer the platform for free since you make money from advertisers who pay you to deliver their ads to your users. But in order to keep those advertisers around, you want to make sure their ads are very effective for how much money they spend with you. That means getting to know more about your users.
Instead of only relying on the data that the user provides, you install a cookie that tracks their behavior across the internet. That way you build a much more accurate picture of the user’s interests, so the next time they log-on to your network they get relevant ads, which many will click on, which leads to advertisers spending more with you.
But the company who pays for the ads never actually sees the individual profiles that click on the ad. They only get that personal information if the person opts-in to their emails or makes a purchase. The social media network creates a wall between the personally identifiable information and the anonymized advertising IDs or interest group tags.
It is not hard to imagine that it is theoretically possible that you, the social media network, could export a giant spreadsheet that associates every person’s profile with those advertising IDs and all of their associated information from the tracking behavior – and sell it to companies to use for their targeting.
This is possible on Facebook because they already let individual users export their row of data! If Facebook wanted to, they could do this for their entire user base. One big ol’ spreadsheet with all of the personally identifiable information of every user. And let companies use it!
This, uh, would end them up in trouble.
I mean, Cambridge Analytica harvested the data of 87 million profiles of US voters (around 5% of Facebook’s total user base in 2015) and that earned a $5 billion fine for Facebook from the FTC, a $725 million class action lawsuit, regulatory oversight, an overhaul of their privacy policies, Congressional hearings, and a lot of public outcry over the incident. Also it probably sped up GDPR adoption in the EU.
And that was just because Cambridge could access a small fraction of actual personally identifiable information plus some survey responses from 200,000 users of a profiling app… and then combined their friend networks with third-party data brokers to build profiles of 87 million users (in this case for political campaigns and their ads).
The problem here wasn’t that the data existed, it was that the profiling app broke Facebook’s Terms of Service which says “no no no you can’t have that information and share it with people, that stays behind our advertising wall, you have to guess which targeting criteria make the most sense.”
Social media profiles are public but their behavior isn’t necessarily public. This is the benefit of running a really large social media network that tracks everyone’s behavior on the platform, in the apps you use, and across the Internet – you get to make money by protecting the individual’s behavior while forcing advertisers to pay you to use that data!1
I guess some of this power is now available to the masses in the US?
This is a homepage screenshot from a vendor who will show you the LinkedIn profiles of people who are visiting your website. These aren’t people who have “opted in” to your cookies, your CRM, or use your software product or anything. These are just (previously) anonymous website visitors, plain and simple. Another vendor describes it like so:
Unless the user has given their explicit permission to be identified, contact-level information will be the result of a probabilistic waterfall. Most website visitor identification software is based on running data points through this waterfall to reconcile them and spit out the algorithm's best guess of who the visitor may be.
If you wanted to figure out the name, email, phone number, mailing address, company domain, and company information of somebody, or really anybody…. you can just buy that data.
What isn’t usually available is the behavior of that person on the Internet. There has been a walled garden between the personally identifiable information (PII) and behavior data. Google Analytics will give you the behavior data – page views or leaving an item in a shopping cart – without the PII, but you can send that Google Analytics data to Facebook to run ads, targeting the specific people by having Facebook match their PII up with the behavior data!
Now, I guess, you can just do this yourself? With a “probabilistic waterfall” that helps match the behavior data with the PII data?2
What I’m really waiting for is a company to publish this feed of visitors on their website.
Nothing brings in business like a crowd, right? Why not replace your testimonial section with a live feed of all of the people who are visiting your website??
This would be plainly illegal in a GDPR regime, because the underlying technology is also plainly illegal under GDPR, so let’s just focus on the US. I mean:
You can already embed a Slack feed on a website
There are no federal protections for releasing people’s information online (called “doxing”), unless you’re an elected official or their family and it’s done with the intent to harass or intimidate
The few state protections that do exist for doxing tend to protect more confidential information (like payroll information), not IP addresses
California’s privacy laws, the CCPA, requires a “Do Not Sell My Information” notice to be available on a website, but you have to opt-out for the company to stop storing your data
CCPA allows business to “sell” or “disclose” data to third parties for “business purposes” which include “providing advertising and marketing services” and I think it’s a pretty easy argument to say the credibility of the feed is helping market your business, especially if you’re already using the data to make calls or conduct other advertising activities
Under CCPA, that third party does have to obey any opt-out notices that the original company receives (but in this case the PII is just… publicly available and not stored?)
This is very funny to me!3 It seems maybe legal? (Not legal advice!)
Not a guarantee – there are already class action lawsuits in California over this kind of technology:
A complaint recently filed in California Superior Court—Gabriela Hernandez v. MRI Software LLC, 23-stcv-14389—opens with allegations that online anonymity promotes the free exchange of ideas and reinforces cybersecurity. It then contrasts these virtues with a process it calls “de-anonymization,” which it alleges “involves cross-referencing anonymized data with ‘commercially available information’ (“CAI”) obtained from grey data markets to reveal an individual's identity.” The crux of the complaint alleges that certain types of online targeted advertising technology is doxing.
But is it “doxing”?
Doxing broadly has a connotation (and some legal precedent) that the public information was released with an intent to threaten or intimidate. There’s enough history of things going wrong when a home address or a personal cell phone number has been leaked that even popular YouTubers have controversies over it.
It’s hard to believe a B2B company is threatening or intimidating buyers. So is it really doxing if a company just breaks down the wall between PII data and behavior data for their own marketing use? Or do they need to publish a live feed on their website to make it doxing? Or would it even be doxing if it was public?
I mean… everything else except their browsing behavior is public or purchasable anyway, right? Why should Facebook be the only one with this kind of data? It’s just some data about visiting your website!
Or at least, that’s what I imagine the defenders of this kind of technology saying.
This whole thing smacks of desperation.
I know a public feed of website visitors is absurd (whoever goes first is for sure getting sued) but I don’t think it’s actually all that far from what companies are currently doing with this technology.
Could you imagine an ice cream shop doing this?
Can’t wait to visit the menu on my local ice cream shop’s website and then have some poor guy behind the counter calling me 15 minutes later saying, “Noticed you checked out our double fudge ice cream, want some??”
I see most companies using this data for precisely this kind of interruption – a phone call or an automated email. It’s direct “hyper targeted” outreach.
There is no clear durable advantage for this kind of marketing. If every website you visited earned you a call, how many calls would it take before you cussed out the person on the other end of the phone? Three calls? Five?
How many calls would it take before you (the buyer) noticed that the calls came in from those websites you visited? How many would you pick up? How many emails would you respond to based on your browsing behavior? How many websites would you stop going to based on their calls?
I can already see the the next wave of LinkedIn posts from the sales gurus once it gets rolled out widely: here’s the best time to call a website visitor and how many hours you should wait before calling. Ugh.
This is all a Pyrrhic victory. How badly do we need deals that we’re willing to sacrifice any sense of anonymity on the web only so we can sell them some software?
It may seem like an exaggeration, but there is a reason spam filters exist! We invented a very very cool technology (email) and then companies abused the heck out of it. We also invented a very very cool technology (the phone) and then companies abused the heck out of it. The phone is so abused that robocallers are still the top complaint with the FCC to this day!
I grew up in an era with a dial-tone for Internet and started freelancing for companies when Google Analytics “Real Time Visitors” was a big deal. All we got to see was fairly inaccurate geographic data and a total count of people.
Marketing was hard then. It still is.
It strikes me as cosmically ironic that more and more B2B buyers want a seller-free experience and want to spend their time doing research at the exact same time companies are trying to de-anonymize their website behavior so they can have salespeople try to contact them. These two things do not jive!
A friend of mine who is a marketer, when I showed him this technology, said:
If your product is so undifferentiated that you have to resort to trying to use this technology, you need to rethink what you’re doing.
We should put our energy into better products, marketing campaigns, and companies. Not doxing (or whatever you want to call it) our website visitors.
Will this tech work? Sure, of course it will.4 Sometimes. For a while.
And then it’ll stop working as well, because that was never the problem in the first place.
Until Apple changes their privacy settings, I guess, and costs you $10 billion.
Not entirely, most vendors seem to only be able to match up to 30% of visitors, so it’s not as reliable as Facebook’s ad network but it’s a start.
Not funny “haha” but funny “wow this is really messed up”
Until it’s made illegal, of course.