若何进击,在此不作申明(也不要问我),次要谈谈若何提防。起首,跨站剧本进击都是因为对用户的输出没有停止严厉的过滤酿成的,所以咱们必需在一切数据进入咱们的网站和数据库之前把能够的风险拦阻。针对不法的HTML代码包含单双引号等,可使用htmlentities() 。
<?php
$str = "A 'quote' is <b>bold</b>";
// Outputs: A 'quote' is <b>bold</b>
echo htmlentities($str);
// Outputs: A 'quote' is <b>bold</b>
echo htmlentities($str, ENT_QUOTES);
?>
如许可使不法的剧本生效。
然而要注重一点,htmlentities()默许编码为 ISO-8859-1,假如你的不法剧本编码为其它,那末能够没法过滤失落,同时阅读器却可以辨认和履行。这个成绩我先找几个站点测试后再说。
这里供应一个过滤不法剧本的函数:
function RemoveXSS($val) {
// remove all non-printable characters. CR(0a) and LF(0b) and TAB(9) are allowed
// this prevents some character re-spacing such as <java\0script>
// note that you have to handle splits with \n, \r, and \t later since they *are* allowed in some inputs
$val = preg_replace('/([\x00-\x08][\x0b-\x0c][\x0e-\x20])/', '', $val);
// straight replacements, the user should never need these since they're normal characters
// this prevents like <IMG SRC=@avascript:alert('XSS')>
$search = 'abcdefghijklmnopqrstuvwxyz';
$search .= 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$search .= '1234567890!@#$%^&*()';
$search .= '~`";:?+/={}[]-_|\'\\';
for ($i = 0; $i < strlen($search); $i++) {
// ;? matches the ;, which is optional
// 0{0,7} matches any padded zeros, which are optional and go up to 8 chars
// @ @ search for the hex values
$val = preg_replace('/(&#[x|X]0{0,8}'.dechex(ord($search[$i])).';?)/i', $search[$i], $val); // with a ;
// @ @ 0{0,7} matches '0' zero to seven times
$val = preg_replace('/(�{0,8}'.ord($search[$i]).';?)/', $search[$i], $val); // with a ;
}