Thursday, October 4, 2007

Reflective XSS protection, output encoding

UPDATE: The best XSS defense strategy is described here:


Thanks to Eric Sheridan over at OWASP for fielding our "battle of the output encoding method for reflective XSS Protection" competition today! All commentary below is from Eric via email on 10/4/07.

>>1) Output encoding try 1 (Jim)


Although it is not frequently mentioned, URL encoding will prevent reflected XSS attacks. The browser will not interpret URL encoded values. It looks as though this approach is sufficient for this particular instance. However, I'd recommend you use HTML entity encoding instead. Aside from addressing XSS, entity encoding will fix that 'ugliness' problem that you mentioned.

>>2) Output encoding try 2 (Brendon)

badChars = [ "<", ">", "#", "&", "'", "\"" , "%", "\\" ];
entities = [ "<", ">", "&", "'", "*", "%",
"\" ];

word = "some bad xss phrase goes here";
out = "";
i = 0;
while(i < ordinal =" toAscii(word{i});" killbadchar =" false;" j =" 0;" ordinal ="="="" killbadchar =" entities[j];"> 126) {
out .= " ";
else {
out .= word{i};

print( out );

Eck, rough looking pseudo-code :)

If I were doing a security review and I saw some code like this used to prevent XSS, I would mark it as a finding (albeit low, for the moment). This is a 'negative' or 'blacklist' approach - the developer is rejecting known 'bad' characters rather than accepting known 'good' characters. Guys like RSnake ( spent their entire career bypassing such blacklist filters. Don't get me wrong, this method will prove effective in a lot of scenarios. Unfortunately, there are going to be special cases where this particular method fails. Consider the case when user supplied data lands within a JavaScript tag. Example:

<script language="JavaScript">
var a = ;

In this particular example, the proof-of-concept would look like "a; alert(document.cookie); var b=" (without the quotes). A real attack vector would have to do quite a bit of obfuscation, but a determined individual will find a way (see 'Myspace Worm').

If you are looking for a good output encoding example, check out

This method follows a 'positive security model'. It only accepts the known good values and entity-encodes all of the rest. I think the method is so simple that it can be easily ported to any language. I'd recommend you use this method in place of the two output encoding attempts listed below. Also, if your validation routines detect someone trying to enter malicious javascript, I'd highly consider logging the event as a "security event". Hope this helps!



Jim Manico said...

By far, the best post on this topic I've seen is

Anyone know of a Java-centric explanation of this issue?

The key is, looks like you need to use UTF-8 for PHP international apps, and UTF-8 has some odd vulnerabilities around ie6 - does the same apply for Java?

ascetik said...

I have added a way to do output encoding in java on my blog that works even if you are using international char sets using the apache commons lang libraries. I even have an example war file that you can download and test with for tomcat 6

Basically use StringEscapeUtils.escapeHtml(StringEscapeUtils.unescapeHTML(input));

James said...

I think the best way to go about output encoding is with ESAPI.