fbpx

What Does the LinkedIn / hiQ Ruling Mean For Web Scrapers?

HiQ has a business model that relies on scraping massive amounts of data from public LinkedIn profiles. It then sells that data to employers who might want to know if they have people who are actively looking for other jobs, for example.

But LinkedIn doesn’t like that idea. So it argued that HiQ was breaking the Computer Fraud and Abuse Act of… 1986.

Yes, 1986. When the internet didn’t exist.

The law

The CFAA is pretty clear on what’s illegal.  Basically, if you get access without permission, or “exceeds authorized access” to obtain information from a protected computer.  The CFAA was enacted by congress in the days when hacking was the big new scary threat.  But does it still work in the internet age?

Well broadly, yes.  The CFAA has been cited multiple times and enforced against people gaining information from web servers, including Facebook. However, Judge Chen astutely notes that those were instances where the defendants were accessing data protected by passwords.

Chen then agrees that if the data being accessed is literally made public by the website and user, then the CFAA doesn’t apply.

Chen further asserts that if the CFAA were allowed to apply to publicly viewable data it would potentially allow websites to “weaponize… criminal sanctions” against any user they wanted to.

Robots.txt and user agreements

Interestingly, though it wasn’t the focus of Chen’s order, the ruling seems to indicate that neither the robots.txt file, nor IP blocking, nor the User Agreement are sufficient to prevent data scraping of public information.

Of course, one need not have a profile on LinkedIn to access the public profiles on the site. This is pretty important because if a site hid profile information until you signed up as a user yourself (as many forums do) then the whole ruling might have been different.

What does it mean for scrapers?

If you do anything in SEO, you’ve probably done at least some data scraping. Even if it’s just a simple crawl with screaming frog, or a full-on extraction tool you wrote yourself, getting key data off of big sites is essential to doing our work.

But if the site’s robots.txt blocks your scraper… can you keep going anyway?

Look, I’m not a lawyer, so I will not give you legal advice on your situation. But here’s what the recent ruling seems to be indicating for us:

  1. Public facing data is really public… for now.  (keep watching, this has the potential to go to the supreme court)
  2. Any data protected by password or requiring you to agree to terms or conditions is probably NOT public and if you try to scrape it you might be subject to criminal charges.
  3. If a site blocks your IP, or robot in robots.txt, that does not make the data on their site less public.
  4. The CFAA was written before the internet existed and, according to Judge Chen, doesn’t apply unless you either obtain access without authorization, or used authorized access improperly.

You can still get sued

Here’s something that most people don’t realize about our legal system.  Even if you’re in the right, even if you are doing everything you should be doing, you can still get sued.

An open and accessible legal system that allows every citizen fair access to justice must also deal with madmen and corporate brutes who want to use the system as a hammer to punish others. It’s the reality of the world we live in.

That means no matter how careful you are, if somebody wants to try and sue you, they can. Protect yourself by being as reasonable and fair as possible.  Don’t break terms and conditions. Keep scrapes slow. Don’t try to access data improperly.

We SEOs have some really cool tools available to us, but with great power comes great… caution. We have a duty to be responsible in how we interact with data and websites. Be smart, be informed, and be responsible in how you interact with the data of others.