Summary: The text discusses how incidents attributed to human error often reveal underlying system design flaws rather than individual mistakes. It emphasizes the importance of conducting blameless postmortems, implementing fail-safe controls, and empowering employees to flag error-prone situations. Overall, the focus is on shifting the blame away from individual errors towards improving system design to prevent errors.
you should never be satisfied with anyone closing out a post-mortem or issue as having a root cause of human error (View Highlight)
where there was an assignment of root cause to human error it actually turned out, when you looked at it more deeply, to be a situation where the humans were in fact performing heroics (View Highlight)
The humans weren’t the problem, in fact they were the saviors already working to mitigate broader root causes (View Highlight)
The real root cause here, among others, is the failure to absolutely prioritize significant events, and to effectively mask away non-critical events using automation (View Highlight)
one of the most important security operations metrics is the false positive rate simply because that is what creates the risk of noise which can crowd out the true positive signal. Another big part of this is to assure the integrity of the monitoring system so that there is confidence in not only its tuning but its overall reliability, fidelity and coverage (View Highlight)
There’s a special place in hell for post-mortems that blame users for big security incidents because they clicked a link. (View Highlight)
You shouldn’t need to be routinely tinkering in production in the first place and any necessary support access should be mediated through layers of access review and subject to menu driven interfaces to reduce the scope of command line error. (View Highlight)