A "breach" is an incident where a hacker illegally obtains data from a vulnerable system, usually by exploiting weaknesses in the software. All the data in the site comes from website breaches which have been made publicly available.
No. The intention of the site is to map email addresses and usernames to data breaches and storing the passwords here would do nothing to achieve that end.
The public search facility cannot return anything other than the results for a single user-provided email address or username at a time. Multiple breached accounts can be retrieved by the domain search feature but only after successfully verifying that the person performing the search is authorised to access assets on the domain.
Occasionally, a breach will be added to the system which doesn't include credentials for an online service. This may occur when data about individuals is leaked and it may not include a username and password. However this data still has a privacy impact; it is data that those impacted would not reasonably expect to be publicly released and as such they have a vested interest in having the ability to be notified of this.
There are often "breaches" announced by attackers which in turn are exposed as hoaxes. There is a balance between making data searchable early and performing sufficient due diligence to establish the legitimacy of the breach. The following activities are usually performed in order to validate breach legitimacy:
A "paste" is information that has been "pasted" to a publicly facing website designed to share content such as Pastebin. These services are favoured by hackers due to the ease of anonymously sharing information and they're frequently the first place a breach appears.
HIBP searches through pastes that are broadcast by the @dumpmon Twitter account and reported as having emails that are a potential indicator of a breach. Finding an email address in a paste does not immediately mean it has been disclosed as the result of a breach. Review the paste and determine if your account has been compromised then take appropriate action such as changing passwords.
Pastes are often transient; they appear briefly and are then removed. HIBP usually indexes a new paste within 40 seconds of it appearing and stores the email addresses that appeared in the paste along with some meta data such as the date, title and author (if they exist). The paste itself is not stored and cannot be displayed if it no longer exists at the source.
Whilst HIBP is kept up to date with as much data as possible, it contains but a small subset of all the records that have been breached over the years. Many breaches never result in the public release of data and indeed many breaches even go entirely undetected. "Absence of evidence is not evidence of absence" or in other words, just because your email address wasn't found here doesn't mean that is hasn't been compromised in another breach.
The breached accounts sit in Windows Azure table storage which contains nothing more than the email address or username and a list of sites it appeared in breaches on. If you're interested in the details, it's all described in Working with 154 million records on Azure Table Storage – the story of "Have I been pwned?"
Nothing is explicitly logged by the website. The only logging of any kind is via Google Analytics and NewRelic performance monitoring and any diagnostic data implicitly collected if an exception occurs in the system.
When you search for a username that is not an email address, you may see that name appear against breaches of sites you never signed up to. Usually this is simply due to someone else electing to use the same username as you usually do. Even when your username appears very unique, the simple fact that there are several billion internet users worldwide means there's a strong probability that most usernames have been used by other individuals at one time or another.
Yes, it has to in order to track who to contact should they be caught up in a subsequent data breach. Only the email address, the date they subscribed on and a random token for verification is stored.
You don't, but it's not. The site is simply intended to be a free service for people to assess risk in relation to their account being caught up in a breach. As with any website, if you're concerned about the intent or security, don't use it.
Sure, you can construct a link so that the search for a particular account happens automatically when it's loaded, just pass the name after the "account" path. Here's an example:
HIBP enables you to discover if your account was exposed in most of the data breaches by directly searching the system. However, certain breaches are particularly sensitive in that someone's presence in the breach may adversely impact them if others are able to find that they were a member of the site. These breaches are classed as "sensitive" and may not be publicly searched.
A sensitive data breach can only be searched by the verified owner of the email address being searched for. This is done via the notification system which involves sending a verification email to the address with a unique link. When that link is followed, the owner of the address will see all data breaches and pastes they appear in, including the sensitive ones.
There are presently 13 sensitive breaches in the system including Adult Friend Finder, Ashley Madison, Beautiful People, Brazzers, Fling, Fridae, Fur Affinity, Mate1.com, Muslim Match, Naughty America, Rosebutt Board, The Fappening and YouPorn.
After a security incident which results in the disclosure of account data, the breach may be loaded into HIBP where it then sends notifications to impacted subscribers and becomes searchable. In very rare circumstances, that breach may later be permanently remove from HIBP where it is then classed as a "retired breach".
A retired breach is typically one where the data does not appear in other locations on the web, that is it's not being traded or redistributed. Deleting it from HIBP provides those impacted with assurance that their data can no longer be found in any remaining locations. For more background, read Have I been pwned, opting out, VTech and general privacy things.
There is presently 1 retired breach in the system which is VTech.
Some breaches may be flagged as "unverified". In these cases, it may not have been possible to establish the legitimacy of the breach beyond reasonable doubt. Unverified breaches are still included in the system because regardless of their legitimacy, they still contain personal information about individuals who want to understand their exposure on the web. Further background on unverified breaches can be found in the blog post titled Introducing unverified breaches to Have I been pwned.
Occasionally, large volumes of personal data are found being utilised for the purposes of sending targeted spam. This often includes many of the same attributes frequently found in data breaches such as names, addresses, phones numbers and dates of birth. The lists are often aggregated from multiple sources, frequently by eliciting personal information from people with the promise of a monetary reward. Whilst the data may not have been sourced from a breached system, the personal nature of the information and the fact that it's redistributed in this fashion unbeknownst to the owners warrants inclusion here. Read more about spam lists in HIBP.
The design and build of this project has been extensively documented on troyhunt.com under the Have I been pwned? tag. These blog posts explain much of the reasoning behind the various features and how they've been implemented on Microsoft's Windows Azure cloud platform.