Continuous Remediation: Infrastructure Security Fixes Tips and Tricks

Its a good feeling to look at your security reports in a large infrastructure and to see good results or progress towards that. Having the capability to detect and remediate security problems in near real time is a huge benefit in getting there. 

Here are some tips I learned in the process of dealing with very large infrastructures as security technical lead. By large I mean between 100 million to 500 million dollars in annual cloud spend in these cases, but these tips could also help you become a standout contributer to many types of infrastructures at many scales. 

I personally worked on code that performed hundreds of thousands of remediations in production systems, and together with the team, including efforts from many security staff and engineers we were able to build a significant measured improvement in security posture. Doing this work is not without resistance, you shouldn't expect continuous remediation to be extremely easy or to never be opposed in conducting this work. I don't want to lead you to believe there are no political, personal or technical hurdles to overcome, but I think its both feasible and an extremely valuable for security. The rewards are large in my opinion.

fig: You can be as at home hunting and remediating complex security issues as a cat in a jungle. These principles, tips and tricks will make you a formidable force in hunting these down and protecting your systems with your colleagues.

The organization in these cases had many cloud products and development efforts occurring across hundreds of teams. Doing security remediation work on this scale takes sensitivity to the objectives of the teams and also the vital protective need for security engineers as a specialization. Fortunately we found that there were big wins to be had in helping teams remediate. The help provided with automation often relieved a burden on teams and allowed them to focus on unique security needs of their product whereas our central team was able to target common issues seen in larger volumes. 

In some cases some teams were very keen to have hands on all changes made, though these teams were a smaller set of individual teams. Getting to know your teams and building a friendly connection to security is one win that comes with being able to offer broad, quick, safe and realtime fixes to security misconfiguration problems. To give you an idea of the scale at which this was done, one of our fixes reconfigured about 15 thousand s3 storage buckets to meet a new standard. So there are good wins to be had from engineering an automation that helps your team as the volume of work for individual engineers to perform that work would have been significant if done by hand. As you might imagine this set of buckets contained large volumes of data (many petabytes) and backed live production systems so this had to be handled with  care. 

fig: There are big opportunities in security engineering teams helping fix problems alongside application security teams and we have seen big value in building the security engineering speciality in our staffing. Security engineering teams can compliment app development teams in tackling security remediations that are faced by many teams.

Here are some of the tips and tricks gained in the process of exploring and engineering many campaigns of security fixes:

Fig: Automation is vital. Real time automated remediations allow you to quickly respond to problems.

Fig: Tackle a larger infrastructures by dividing issues into campaigns. This allows you to make reportable progress without getting bogged down or overwhelming your team or the application teams.

Fig: Make sure you are using multiple channels to communicate to your teams. Understand people are busy and may not see your email, use your ticket system, instant message boards etc to make sure you have given your teams an opportunity to understand what needs to be fixed.

Fig: If you take on too many representations of infrastructure as code to remediate you may end up creating an insurmountable volume of work and locations to change. Its enough to remediate the most embodied form of the infrastructure, in my opinion that is the direct cloud API if you have built automation that will do that in real time.

fig: Minimum surprises is best. Having workflow automation for approvals from teams if you need them is well worth it as an investment in a large infrastructure. Similarly the importance of testing and working in lower environments first as well as careful monitoring also contributes to minimum surprise once you get to more impactful higher environments. On the other hand you should not become unable to take action on issues even if it causes some surprises. This is an issue that requires balance.

Fig: You'll need to be able to document exceptions and compensating controls with teams. We feel teams should own and retain this documentation and the security team should maintain access to it and archive copies of these documents that are collaboratively edited as well as review and sign off on it as an expert advisory service to the risk owners on the system (usually the system CISO). The most important thing is that you want your teams to be able to leverage past time invested in documenting when a control would not achieve security objectives so that they don't have to repeat that effort on each audit. Don't underestimate the challenge of responding requests for exceptions - this can be a challenge for a small team to analyze a large infrastructure in this way.

Hopefully this helps on some of the general overview and architecture questions you may run into. My intent here is not to sell anything, but to make the barrier to entry lower for other teams facing this sort of challenge and in so doing leveraging the hard work done on our various projects for further benefit to citizens if nothing prevents it.

I would like to thank the many dedicated security colleagues, engineers and teams that have helped resolve hundreds of thousands of findings over the years. The dedication and determination of these individuals may be why you or a loved one enjoys protection from identity theft today or retains privacy choices. In my experience seeing a team of engineers, CISO's and others come together to solve a seemingly very challenging security task is very inspiring - much good has been done by these teams to protect a great many citizens.