This week saw “one of the biggest data breaches ever” with more than 100 million people impacted. Major news media is urging people to lock their credit as we all wonder how to safeguard our identities. They paint a picture of a clever social outcast hacker who beat Capital One’s systems.
The reality is a little less exciting. Not so much a story of brilliant math minds gone amuck, but more one of human nature, entropy, and questionable systems design. Let’s start by looking at the facts as we know them:
“Likely tens of millions” of credit card applications stored from 2005 to today.
140,000 social security numbers
80,000 bank account numbers.
Names, addresses, dates of birth linked to those numbers
According to Capital One, someone exploited a "configuration vulnerability"
The FBI alleged Thompson used a single command and "obtained security credential for an account known as ******-WAF-Role that, in turn, enabled access to certain of Capital One's folders at the Cloud Computing Company."
The suspect was a former AWS employee, they didn’t cover their tracks well at all and boasted about it on social media.
In all fairness, this stuff is hard. Technologies are constantly evolving at an ever increasing rate. To stay competitive, large firms are required to embrace big data, marketing automation, machine learning and who knows what buzzword tomorrow. That stuff reads great on an annual report, but the more complicated you make any system the more likely there are to be security holes from architecture design mistakes or human error when building or maintaining the system.
There’s a great book by General McChrystal (Team of Teams) about managing the difference between complicated and complex. In a nutshell, when a certain level of complexity is reached and it no longer makes sense to design controls from the top down, you have to architect controls to monitor the live system because there’s no way to accurately predict what’s happening by design. Think about the difference between hitting 2 balls on a pool table vs. breaking all 15. It doesn’t matter how steady your hand is, not even the best professionals can predict exactly what will happen on a break.
I’m also not sure how we’re getting to 100m people affected here. Obviously tens of millions isn’t much better, but at least the CEO is being sincerely apologetic and not trying to round down the scale of mistake. No one is perfect, this is clearly bad, but let’s try to keep one foot in reality.
Now to the lessons to learn.
Why do you have 14 years of personal data stored ANYWHERE???
We host parts of the U.S. Army’s web presence. As such, we are required by law to store logs of everything that happened to that system. Who accessed any part of it, who changed what, etc. We put all of this in an encrypted searchable archive of data called an ELK stack (We also use AWS just like Capital One does). While we’re not dealing with financial data, we have plenty of Personally Identifying Information (PII) that we securely manage. One of the key things we ask our clients is “how long must we keep this?” Seven years of archived logs is what we retain (by law) for our Federal Government clients. If we could get away with five or even two years, we would. Anytime we can sanitize data and just keep a record that something happened without also keeping the PII, we do.
Sure there’s value in storing this stuff for a while, but I’m struggling with understanding why Capital One would need to keep millions of credit card applications from 14 years ago around on file. It’s certainly not for customer convenience; do you think anyone in customer service is going to be able to easily get to that data if you were to call back up wanting to complete a half finished application from over a decade ago? Credit ratings from years ago are almost certainly no longer valid today! It seems unlikely to be a banking regulation; certainly keeping 7 years of back taxes on file is considered pretty good practice, but 14 years of complete consumer credit application data just sitting in an archive somewhere? We can’t help but to think that just seems sloppy. Perhaps it was always “someone else’s job to delete that.” Understandable, we’ve all grown up learning to save backups of backups of our work, but when you’re managing someone else’s data, you need to think differently.
GDPR and now the California Consumer Protection Act both challenge us to think differently; data should be removed unless it is required. Why store something potentially dangerous unless you really expect to need it? If there was a requirement to keep a history of applications, they should have been sanitized. Surely the last 4 digits of a social and their base info would be enough to satisfy any regulatory requirements around “prove this person DID apply 14 years ago.” I struggle to imagine the meeting where the CTO/CIO and team sat around and said “ya know, I think we should just zip up all these bank account and social security numbers and leave them in a S3 bucket just in case we need them one day.”
Was there an expiration policy on this data? What happened to the data from 2004, was that removed by design, or was that simply the year they rolled this new system out?
WAF rules are the beginning, not the end, of security.
Think of a Web Application Firewall (WAF) as a chain link fence around a building with guards at every gate. Not all the gates even open, and only people on a list are supposed to get through the gates. The “hack” that this Capital One employee used was simply having access to that gate. Obviously that shouldn’t happen. We could talk about treating employees with respect and fostering a culture where you don’t have disgruntled former employees trying to teach you a lesson, but you can certainly never depend on everyone liking you. Regardless, the media will jump on this path. Why would this woman do this to 100 million of us?! Is a pretty easy article to write. “Nerd hates society and betrays us all” is a narrative that anyone can understand while encryption is confusing.
The huge unasked question here should be why was this data not configured so it was harder to decrypt? Even if you buy the argument that Capital One had to store 14 years of consumer’s personal information on a server that could be accessed (at all) from outside of their networks, why would that raw data not be archived in such a way that, when a grumpy employee found a way to get it out of the internal network, it was anything more than an unusable file of 0’s and 1’s? There are plenty of encryption methods that could be used to keep this data unaccessible without decades of computing power to actually hack a key.
99% of the SSNS and account numbers in the huge backlog of Capital One data that was taken was “uncompromised” because they were “tokenized” or “scrambled”. This begs the question, what about the other one percent? The one percent that equated to 140,000 social security numbers and 80,000 bank account numbers. What seems most likely is that this data was left on a S3 bucket in whatever format was easiest at the time.
Not enough ongoing attention went into making sure these security tools were actively being used. It’s as if they bought an amazing safe and then just left a bunch of important papers sitting on top of it.
Monitoring, checks and balances, operational audits.
It is fun to stand at a white board and architect a new system. It is a lot less fun to go through an excel list of server names asking “what the hell do we even use this for any more?” Is this bucket private or public? If it is public, is it necessary for the function of the bucket? If yes, what is in the bucket that we shouldn’t be storing? Audits like these are painfully detailed, boring and time consuming. No one gets a pat on the back for the work. When you do it well, it’s never noticed. This case feels like it should have been caught with any number of thankless boring internal operational audits.
That was a lot of data. Just physically someone should have seen “wow we’re storing gigabytes of data here, what is it? Why do we need it? Whose department is that for?” Clearly that didn’t happen.
That data she took, it has to go over some cables. Even if there were good reasons to keep 14 years of data around, and even if you admit that HR policies are never perfect and the possibility that a former employee might be able to get through a gate, why was there no ability to see this huge amount of data being transferred over the network? That is a little like the former employee emerging back from the building with a semi truck and driving out the front gate. No one thought that was odd?
There’s SIEM software that can be configured to look at network activity and report on things just like this. If she hadn’t been posting about her great feat on social media, would she have ever been caught?
Look, none of us are perfect and this type of work is not easy. We’re not trying to beat up on Capital One here; they seem to be doing much better in handling this disaster than some others in past similar situations. They are certainly not alone.
The big takeaway here for everyone should be a willingness to treat maintenance, operations, and security processes with the same excitement, budgets, and respect that building new things gets. This is the real challenge. We work in an industry where “move fast and break things” is the motto for one of the biggest players. We fawn over startup culture and the idea of two people building something amazing in their garage. Here at PortlandLabs, we’re super proud of how fast we can bang out a minimum viable product for just about any idea. It feels great launching something in six weeks that everyone told you was a six month challenge. That type of ability to sprint is great, but it’s an obvious culture that everyone wants to create.
What technology leaders need to focus on is fostering a culture where some junior systems person at Capital One who noticed 14 years of PII sitting on a server would put up a stink about it until it was deleted. We should be creating a process and culture where that person would instinctively know their career would be advanced for being the person who identified the mess and offered to clean it up. Instead we’re urging our brightest minds to “fail bigger” on agile development projects on tight deadlines that naturally lead to sloppy files left on servers.