Thursday, January 21, 2010

How bad is it?

Thank you to John Menerick and Ben Nagy for entertaining my questions on the Daily Dave list.

Q: Is the recent ie6 0-day anything special?

John: Not really. Not as special as the NT <-> Win 7 issue recently highlighted.

Q: How many similar 0-days are for sale on the black market?

John: Quite a few.

Ben: I'd love to see your basis for this assertion. I'm not saying that in the "I don't believe you" sense, only in the "everyone always says that but nobody ever puts up any facts" sense.

Q: What is the rate/difficulty for discovery of new windows-based 0-days for the common MS and Adobe products that are installed on almost every corporate client? (I heard Dave mention that discovery is getting more difficult)?

John: Not terribly difficult for someone who is dedicated. Then again, my idea of difficult is much different from the avg. person

Ben: I think that while finding 0-days might be 'not terribly difficult', selecting and properly weaponising useful 0-days from the masses of dreck your fuzzer spits out IS difficult - at least in my experience. There was some discussion of the 'too many bugs' problem on this list previously and I know several of the other fuzzing guys are currently researching the same area. Of course you'd explain this to your 'avg. person', as well as explaining that the skillset for finding bugs is not necessarily the same as the skillset for writing reliable exploits for them, and that 'dedication' may not sufficiently substitute for either.

Lurene Grenier: I really feel that the "selecting good crashes" problem is not that hard to overcome if you have a proper bucketing system, and the ability to do just a bit of auto-triage at crash time. For example, the fuzzer I use now both separates crashes by what it perceives to be the base issue at hand, and provides a brief notes file with some information about the crash and what is controlled. This requires just a bit of sense in providing fuzzed input, and very little smarts on the part of the debugger. I really think the next step is automating that brain-jutsu; much of it is hard to keep in your head, but not hard to do in code.

Using this output, it's pretty easy to spend a lazy morning with your coffee grepping the notes files for the sorts of things you usually find to be reliably exploitable. From there you can call in your 30 ninjas and have at.

Creating reliable exploits is for sure the hardest part, but once you've done the initial work on a program, the next few exploits in it are of course more quickly and easily done.

As for the thought experiment, I think that the benefit of the top four researchers is that they've trained themselves over a long period of time (and with passion) to have a very good set of pattern-recognition tools which they call instincts. They know how to get crashes, and they know having seen one crash what's likely to find more. They know how to think about a process to get proper execution, and they're rewarded by success emotionally which makes the lesson learned this time around stick for when they need it again.

I honestly think that there is more pattern recognition "muscle-memory" type skill involved in RE, bug hunting, and exploit dev than pure mechanical process, which is why the numbers are so
skewed. It's like taking 4 native speakers of a language (who love to read!) and 100 students of general linguistics with a zillion dollars. Who will read a book in the language faster?

Q: How easy is discovery for someone with resources like the Chinese government?

John: Much simpler.

Ben: Setting aside the previous point that discovery is only the start, I think it's instructive to consider which elements of the process scale well with money.

Finding the bugs: You need a fuzzing infrastructure that scales - running peach on one laptop with 30 ninjas standing around it with IDA Pro open is not going to work. Also consider tracking what you've already tested, tracking the results, storing all the crashes, blah blah blah. This does scale well with money, but it's an area that not as many people have looked at as I would like.

Seeing which bugs are exploitable: Using a naive approach, this scales horribly poorly with money - non-linearly, to put it mildly. There are only so many analysts you will be able to hire that have enough smarts to look at a non-trivial bug and correctly determine its exploitability. You only have to look at some of the Immunity guys' (hi Kostya) records with turning bugs that other people had discarded as DoS or "Just Too Hard" into tight exploits. Even for ninjas, it's slow. There is research being done into doing 'some' of this process automatically (well, I'm doing some, and I know a couple of other guys are too, so that counts), but I don't know of anyone that has a great result in the area yet - I'd love to be corrected.

Creating nice, reliable exploits: I'd assert that this is like the previous point, but even harder. To be honest, it's not really my thing, so probably one of the people that write exploits for a living would be better to comment, but from talking to those kind of guys, it's often a very long road from 'woo we control ebx' to reliable exploitation, especially against modern OSes and modern software that has lots of stuff built in to make your life harder. I don't know how much of the process can really be automated - I mean there are some nice things like the (old now) EEREAP and newer windbg extensions from the Metasploit guys that will find you jump targets according to parameters and so forth, but up until now I was labouring under the impression that a lot of it remains brain-jitsu, which is hard to scale linearly with money.

So, while I think that 'simpler' is certainly unassailable, I would need more than a two word assertion to be convinced that it is 'much' simpler. If you give one team a million dollars and 100 people selected at random from the top 10% graduating computer science and you give the other team their pick of any 4 researchers in the world and 3 imacs, whom does the smart money think will produce more weapons grade 0day after 6 months?

(No it's not a fair comparison. It's a thought experiment.)

Food for thought, perhaps, since sound bites need little care and feeding.

Q: How bad is it really?

John: Look at the CVSSv2 score and adjust it to the environments where you determine "how bad it is." It could be much worse.

Q: I suspect we are just looking at one grain of sand in a beach of 0-days....

John: Correct. No one wants to let everyone else know what cards they hold in their hand, the tools in their toolbox, etc....

1 comment:

dre said...

This exploit is not just a run-of-the-mill. It's likely weaponized, and of high quality.

Any weaponized, high-quality exploit targeting a high-profile target such as Internet Explorer is going to be valuable and useful for the long-term. Many professional, quality penetration-testers use older exploits because they are often more likely to pop up shells into not just a legacy computer, but also a legacy network.

The 0-day market is ugly and scary. However, it is my personal belief that we have already reached a tipping point, perhaps as early as 1998 or 2001 (and definitely by now). Weaponized exploits are in fact, less useful than penetration-testing toolkits that work specifically towards custom/bespoke application layer attacks e.g. SQLi. Unfortunately, we have not seen Metasploit, CANVAS, or Core Impact offer these types of attack engines until the past 2 years or so. When did Metasploit begin to add `web application' attack capability?

It is very easy to find an Adobe flaw that is likely a serious buffer overflow that would allow remote shell access. However, this is not as easy with current Microsoft products. Also -- taking a flaw finding to a weaponized exploit (and how well weaponized that exploit can be) does take a considerable amount of time and effort (as Dave Aitel alludes to). If you know enough to run WinDbg with the MSEC extension, then you have at least a partial answer to this question... and yes... it's very easy: manually or automated.

John Menerick suggested to look at the CVSSv2 scores. However, I don't think these are accurate at all, having put CVSSv2 to the test myself several times and watching it fail under a variety of conditions. Even if you were a master at adjusting CVSSv2 scores, I think it's still completely useless and will end up biting you eventually.

I would rather see use of the OSSTMMv3 than CVSSv2. It's more accurate. I also believe that bringing vulnerability and penetration-testing data up to operational and strategic activities is NOT as good as bringing more accurate risk management data down to the tactical layers. In other words, I'd rather see use of PRA, FAIR, or FMEA to feed information to vulnerability management, technical compliance, and penetration-testing toolkits instead of the opposite.