Often when online services are compromised, the first signs of it appear on "paste" sites like Pastebin. Attackers frequently publish either samples or complete dumps of compromised data on these services. Monitoring and reporting on the presence of email addresses on the likes of Pastebin can give impacted users a head start on mitigating the potential fallout from a breach.
When you search for an email address on this site, both known data breaches and pastes are searched simultaneously. After the results are returned, they both appear side by side with an indication of where the address was found in a breach versus in a paste.
Pastebin (among other paste services) stores tens of millions of pastes and adds thousands more new ones every day. Rather than attempt to analyse every paste in the system, Have I Been Pwned monitors the appearance of new pastes as announced by the Twitter accounts in the Paste Sources list.
One of the attractions of paste services is that there are no constraints on the structure of the content that can be published there. Consequently, pastes containing email addresses may be very self-explanatory or appear completely obscure. However, there are some common patterns which appear.
Database dumps: These will often take the form of scripts that can be run to recreate the database structure. They typically contain comma-delimited fields representing different columns in the database, often with passwords which may be secured with a cryptographic hash. Example:
(`id`, `team_id`, `email`, `name`, `password`, `league`, `active`, `regdate`, `lan`, `lastlogin`, `birthdate`, `favclub`, `favmanager`, `description`, `pers_email`, `mess_id`, `iso`)
(14, 568, 'vcpd_@hotmail.com', 'Flavio00', '059b4db7cdb1cbddc3f0e5d95c881597', 1, 1, 1224313200, 0, 0, 0, '', '', '', '', '', ''),
(4, 1, 'levi@medeeaweb.com', 'Slash', 'c57aeddaffce62fead6be61022eb1340', 1, 1, 1224313200, 0, 1235380637, 483260400, 'FC Juventus Torino', 'Carlo Ancelotti', 'I''m the admin of this site :D', 'slash@manager-arena.com', 'slashwebdesign', ''),
Email and password pairs: Compromised systems are often dumped into lists of credentials consisting of username (often the email address) and password, occasionally with other data accompanying it. Example:
majikcityqban82@gmail.com:tinpe***
pekanays@yahoo.com:warri***
g_vanmeter@yahoo.com:torb***
rrothn@yahoo.com:rebsopj***
Logs and code blocks: These can take on a range of different forms and may be anything from compromised system logs to internal system code. Example:
array("/upload/iblock/ed0/--.jpg","vitaly.cherkasov@autohansa.ru"),
array("/upload/iblock/562/--.jpg","andrey.mastakov@autohansa.ru"),
array("/upload/iblock/ed2/---.jpg","sergey.smirnov@autohansa.ru"),
Random collections of email addresses: There is often no context given as to where an email address is sourced from, it simply appears along with others. Example:
dilipsinghrana4@gmail.com,
mansinghrana22@gmail.com,
dilipsinghrana2@gmail.com,
khalisinghrana3@gmail.com,
Each of the above examples is representative of the sort of data structures often seen in pastes. The appearance of the email address may be completely innocuous but it also often indicates a serious breach. Only human review and assessment can determine if the paste represents a risk that requires a response such as changing passwords.
The presence of an email address on a paste site doesn't always mean it's been compromised in a breach and the process that scans for addresses is entirely autonomous — there's no human review. Do take a look at the paste and assess the impact for yourself if your address appears there.
Often a paste will appear on a service such as Pastebin multiple times. It may be identical or contain slight variations but for all intents and purposes, it's the same content. This may be because the same individual has published it multiple times or because a breach has been socialised and then re-published by multiple people.
Have I Been Pwned does not store the original paste, only metadata such as the title and author if they exist. As such, there is no facility to identify duplicate pastes and instead human discretion should be exercised if multiple pastes are found that appear to be the same.
Services like Pastebin are pretty explicit about what is deemed to be "acceptable use" of the service; no email lists, no login details, no password lists and no personal information. Despite this, all these data classes frequently appear on Pastebin many, many times per day. However they're often transient, appearing briefly before being removed.
Have I Been Pwned usually consumes the paste data within 40 seconds of it being published. However, only metadata about the paste (title, author, date) and the email addresses appearing in the paste are stored. No further data such as credentials or personal information is stored. The entire premise of the service rests on the service being searchable via email address so additional data (such as the original paste in its entirety) is not required.