PHP 5.2+ Data Filtering Extension = BAD?
Yesterday while browsing some security tagged discussions on stackoverflow.com I’ve noticed someone mentioned some filter_ prefixed PHP functions. At first I thought they were some custom written ones, but on a quick check it turned out that there really where this functions. I was shocked. Anyway, let’s digg into it…
Filters
In the filters extensions we have 3 types of filters: validate, sanitize and other filters. Let’s take em each separately and see what the extension has to offer us.
- Validate Filters
-
- Data Type Validation
FILTER_VALIDATE_BOOLEANValidates if the specified variable/value is a boolean value.
FILTER_VALIDATE_FLOATValidate if the specified variable/value is of type float.
FILTER_VALIDATE_INTBesides the fact that it validates integers it allows you to specify a range in which the specified variable or value should be. It also allows you to validate octal and hexadecimal numbers.
-
- String Validation
FILTER_VALIDATE_EMAILValidates if the variable/value is a well formed email address.
FILTER_VALIDATE_IPCan be used for validating IPv4/IPv6 addresses, having flags to disallow reserved/private IP ranges.
FILTER_VALIDATE_REGEXPValidates variable/value with regular expressions.
FILTER_VALIDATE_URLValidate URL’s. I wouldn’t recommend it though, since it validates
'http://...'. Better of with regular expressions here.
- Sanitize Filters
FILTER_SANITIZE_EMAILI highly recommend to not use this filter, because it won’t sanitize the email address. The characters
!#$%&'*+-/=?^_`{|}~@.[]will remain intact.FILTER_SANITIZE_ENCODEDURL-encode string, optionally strip or encode special characters.
FILTER_SANITIZE_MAGIC_QUOTESApplies
addshashes()to the specified variable/value. Seriously, shouldn’t this be extinct already. I though we should have left them behind once we moved away from PHP 4.x. Try to not use this one, because in some SQL systems backslashes are not escape characters.FILTER_SANITIZE_NUMBER_FLOATRemove all characters except digits,
+-and optionally.,eE.FILTER_SANITIZE_NUMBER_INTRemove all characters except digits, plus and minus sign.
FILTER_SANITIZE_SPECIAL_CHARSHTML-escape
'"<>&and characters with ASCII value less than 32, optionally strip or encode other special characters.FILTER_SANITIZE_STRINGStrip tags, optionally strip or encode special characters.
FILTER_SANITIZE_STRIPPEDAlias for the above filter.
FILTER_SANITIZE_URLRemove all characters except letters, digits and
$-_.+!*'(),{}|^~[]`<>#%";/?:@&=.FILTER_UNSAFE_RAWDo nothing, optionally strip or encode special characters.
- Other Filters
FILTER_CALLBACKCall user-defined function to filter data.
As you must have noticed the sanitizing filters are pretty bad, some even repetitive. Although haven’t marked red all of them, I surely won’t use them. For sanitizing (against XSS) I’ll use good old strip_tags() combined with htmlspecialchars() because this way I can define quote style encoding, and charset in which to encode. As for safe SQL queries, I use db specific functions.
Functions
Ok, so we had some complains about the filters. But let’s look beyond that for a moment and see what the filtering functions have to offer us.
- filter_has_var
Through this function we can check if a POST, GET, ENV, SERVER, COOKIE value has been set.
if(filter_has_var(INPUT_POST, 'submit')) {
echo 'yes the submit value has been set';
}
Although this might seem similar to a isset() usage, be not fooled by it. It (probably) takes a snapshot of all the superglobals (POST, GET, ENV, SERVER, COOKIE) so…
/*
submit isn't set
*/
$_POST['submit']=1;
if(filter_has_var(INPUT_POST, 'submit')) {
echo 'this will not be echoed';
}
Is this a good thing? Well, it depends. Haven’t seen till now somebody who controlled the flow of the application (in a script) through setting/unsetting a value in the superglobals, but I have seen hand coded register_globals implementation (for backwards compatibility) in PHPList which permitted that the SERVER['file'] (named something like that) to be overwritten and making it vulnerable to remote file inclusion. So in that particular scenario it would have helped.
- filter_id and filter_list
One returns the numeric value of a filter, while the other returns a list of filters… moving on.
- filter_input and filter_input_array
The (ONE) function for the “Data Filtering Extension”. I’ll post an example from the documentation.
$search_html = filter_input(
INPUT_GET,
'search',
FILTER_SANITIZE_SPECIAL_CHARS
);
$search_url = filter_input(
INPUT_GET,
'search',
FILTER_SANITIZE_ENCODED
);
echo "You have searched for $search_html.\n";
echo "<a href='?search=$search_url'>Search again.</a>";
- filter_var and filter_var_array
It works in the same way as filter_input, just that now you can use the filters on variables/string… these examples are also from the documentation.
// will validate
var_dump(filter_var(
'bob@example.com',
FILTER_VALIDATE_EMAIL
));
// will fail (return false), because it
// misses the scheme (http://)
var_dump(filter_var(
'example.com',
FILTER_VALIDATE_URL,
FILTER_FLAG_SCHEME_REQUIRED
));
Should I use it?
Well, can’t say for sure. The validation filters seem pretty good unless you count the URL one… actually that could be a good filter also if you shouldn’t add 3 flags to behave like you normally would expect…
The other reason why I can’t pronounce a final answer (maybe someone who reads this will) because I haven’t checked the source code of the extension…
You have more information about it (not quite that big of a difference) from the online documentation page found here. Waiting your opinion on this one…
Update: Chuck Norrises was here and updated the text, removing my opinion of vulnerable code. eof!

