Querying the Pwned Passwords API to Identify Breached Passwords

Troy at haveibeenpwned.com has released an updated API for confidentially searching an enormous collection of breached login credentials, half a billion entries. Critically, the design of the API avoids transmitting any actual password value or even hashes.

I’ll explain how the API works and share a couple bits of code for querying the service manually.

How the service was constructed

The raw content is gigantic, many tens of GB of unpacked plaintext, assembled from breaches released on the dark web at different points in time. To prepare the dataset that underlies the service, Troy regex’ed the raw breached content for password strings, one-way hashed them with the SHA1 algorithm, and reduced the data into ( hash : frequency count ) pairs. The most current reduction to SHA1 hashes and frequency counts stands at 8.8GB packed.

Previously, if you wanted to search this dataset securely, you either had to download the whole thing (which you can still do) and search it locally, or submit the full SHA1 hash of a password to the API. This was a little less than ideal because it required you to trust the API operator, who had after all built the SHA1 dataset and could reverse lookup easily. The new arrangement and updated API solve this problem.

Troy generated 1,048,576 text files, one for each of the 16^5 possible starting prefixes (the first five characters) of the hashes when written as hexadecimal strings. Each text file has sparse row entries (381 rows min, 584 rows max, 478 rows average are the given statistics) for all the hash suffixes (the part from the sixth character to the end) with hit counts, in the format suffix:hits. This is the data the new API serves.

This arrangement supports a procedure for clients to query the dataset for the known leakedness of a password value without giving away their whole hand. Clients SHA1 hash a password locally, load a URL containing the distinguishing prefix (first five characters), and look for the suffix in the returned response. At no time is the password or even its full SHA1 hash transmitted to the service. This is being touted as “k-anonymity” but that’s just an unnecessarily fancy way of saying the client does the last step of lookup in a set of entries with non-small n and the server doesn’t know which entry in its response belongs to the requester.

The remote server and its operator consequently can not associate your final query objective with your request and can not work backwards to your password. Which is of course of paramount importance, vis a vis disclosure risk. You should never, ever transmit your actual password value to a third party site, not even to test it.

A form operating at https://haveibeenpwned.com/Passwords implements the client side procedure. End user applications in the field have started integrating the API as well.

But suppose you’re a skeptic and don’t feel comfortable with those options. (We just stressed you should never enter your password on a third party web form, literally five seconds ago, right?) No problem. You can perform the procedure to query the API yourself from the command shell or via scripting. Others have cooked up various solutions, here are a couple of mine.

How to query the API using shell commands

On a Linux shell (Windows users, you don’t have this, sorry) to hash the password, pipe it into openssl sha1 on stdin:

echo -n {password} | openssl sha1

Note the -n option to avoid automatically writing a carriage return, which would throw off the hash result. Then simply copy the prefix (first five characters) of the hash and open this URL in a web browser:

https://api.pwnedpasswords.com/range/{prefix}

Search the returned page for the hash suffix (sixth character on) to determine the count of matches. If the suffix is absent, the password is not present in the breached credentials collection (and you should breathe a sigh of relief). If present, the value after the full colon is the hit count. The higher the hit count, the more times the password string has been seen in breaches (and the more severely you should panic).

How to query the API using Python

Too much work, you say? Here’s a quick and dirty Python script using only standard library functionality to let you query the API interactively.

#!/usr/bin/python3

import hashlib
import urllib.request

while True:

	p = input('Password to test (or blank to exit): ')
	if not p: break

	h = hashlib.sha1(p.encode()).hexdigest().upper()
	prefix, suffix = h[:5], h[5:]

	url = 'https://api.pwnedpasswords.com/range/' + prefix
	req = urllib.request.Request(url)
	req.add_header('User-Agent', 'Python-Pwnedpasswords-Check')
	res = urllib.request.urlopen(req)
	content = res.read().decode()

	print([l for l in content.split() if suffix in l])
	print() # separator

This script will print a matching row for the query if present, or an empty list if the password is not present in the collection.

This will run anywhere you can get Python 3 running, Windows, Mac, or Unix. The password p never leaves the client… inspect the code for yourself and see.

How to improve your password security

Tried these queries against your passwords? Scared yet? Here’s what you can do to avoid getting pwned:

Do not reuse the same passwords everywhere, or on many different sites.
Use a password manager application to store your passwords in an encrypted file, and back it up.
Use generated passwords, generated to a random value, for almost all cases, and especially those with low trust. (KeePassXC packs a spectacular password generator.)
Preserve your highest value, memorized passwords for things you actually have to log into 40 times a day. Use them extremely sparingly.
Go back and change passwords to generated values on sites you’ve used reused or valuable passwords on in the past. (Yes, this is a giant pain versus practicing good password security as you go.)
Enable 2FA (two factor authentication) wherever available to add defense in depth. Definitely enable this on email, social media, financial, and major merchant sites, they all offer it.

Resources

Pwned Passwords

Troy Hunt: I’ve Just Launched “Pwned Passwords” V2 With Half a Billion Passwords for Download

Ars Technica: Find out if your password has been pwned—without sending it to a server