PHP 5.2+ Data Filtering Extension = BAD?

Yesterday while browsing some security tagged discussions on stackoverflow.com I’ve noticed someone mentioned some filter_ prefixed PHP functions. At first I thought they were some custom written ones, but on a quick check it turned out that there really where this functions. I was shocked. Anyway, let’s digg into it…


Filters

In the filters extensions we have 3 types of filters: validate, sanitize and other filters. Let’s take em each separately and see what the extension has to offer us.

    Validate Filters

    • Data Type Validation

    • FILTER_VALIDATE_BOOLEAN

      Validates if the specified variable/value is a boolean value.

    • FILTER_VALIDATE_FLOAT

      Validate if the specified variable/value is of type float.

    • FILTER_VALIDATE_INT

      Besides the fact that it validates integers it allows you to specify a range in which the specified variable or value should be. It also allows you to validate octal and hexadecimal numbers.

    • String Validation

    • FILTER_VALIDATE_EMAIL

      Validates if the variable/value is a well formed email address.

    • FILTER_VALIDATE_IP

      Can be used for validating IPv4/IPv6 addresses, having flags to disallow reserved/private IP ranges.

    • FILTER_VALIDATE_REGEXP

      Validates variable/value with regular expressions.

    • FILTER_VALIDATE_URL

      Validate URL’s. I wouldn’t recommend it though, since it validates 'http://...'. Better of with regular expressions here.

    Sanitize Filters

  • FILTER_SANITIZE_EMAIL

    I highly recommend to not use this filter, because it won’t sanitize the email address. The characters !#$%&'*+-/=?^_`{|}~@.[] will remain intact.

  • FILTER_SANITIZE_ENCODED

    URL-encode string, optionally strip or encode special characters.

  • FILTER_SANITIZE_MAGIC_QUOTES

    Applies addshashes() to the specified variable/value. Seriously, shouldn’t this be extinct already. I though we should have left them behind once we moved away from PHP 4.x. Try to not use this one, because in some SQL systems backslashes are not escape characters.

  • FILTER_SANITIZE_NUMBER_FLOAT

    Remove all characters except digits, +- and optionally .,eE.

  • FILTER_SANITIZE_NUMBER_INT

    Remove all characters except digits, plus and minus sign.

  • FILTER_SANITIZE_SPECIAL_CHARS

    HTML-escape '"<>& and characters with ASCII value less than 32, optionally strip or encode other special characters.

  • FILTER_SANITIZE_STRING

    Strip tags, optionally strip or encode special characters.

  • FILTER_SANITIZE_STRIPPED

    Alias for the above filter.

  • FILTER_SANITIZE_URL

    Remove all characters except letters, digits and $-_.+!*'(),{}|^~[]`<>#%";/?:@&=.

  • FILTER_UNSAFE_RAW

    Do nothing, optionally strip or encode special characters.

    Other Filters

  • FILTER_CALLBACK

    Call user-defined function to filter data.

As you must have noticed the sanitizing filters are pretty bad, some even repetitive. Although haven’t marked red all of them, I surely won’t use them. For sanitizing (against XSS) I’ll use good old strip_tags() combined with htmlspecialchars() because this way I can define quote style encoding, and charset in which to encode. As for safe SQL queries, I use db specific functions.


Functions

Ok, so we had some complains about the filters. But let’s look beyond that for a moment and see what the filtering functions have to offer us.

  • filter_has_var

Through this function we can check if a POST, GET, ENV, SERVER, COOKIE value has been set.

if(filter_has_var(INPUT_POST, 'submit')) {
  echo 'yes the submit value has been set';
}

Although this might seem similar to a isset() usage, be not fooled by it. It (probably) takes a snapshot of all the superglobals (POST, GET, ENV, SERVER, COOKIE) so…

/*
 submit isn't set
*/
$_POST['submit']=1;
if(filter_has_var(INPUT_POST, 'submit')) {
  echo 'this will not be echoed';
}

Is this a good thing? Well, it depends. Haven’t seen till now somebody who controlled the flow of the application (in a script) through setting/unsetting a value in the superglobals, but I have seen hand coded register_globals implementation (for backwards compatibility) in PHPList which permitted that the SERVER['file'] (named something like that) to be overwritten and making it vulnerable to remote file inclusion. So in that particular scenario it would have helped.

  • filter_id and filter_list

One returns the numeric value of a filter, while the other returns a list of filters… moving on.

  • filter_input and filter_input_array

The (ONE) function for the “Data Filtering Extension”. I’ll post an example from the documentation.


$search_html = filter_input(
    INPUT_GET,
    'search',
    FILTER_SANITIZE_SPECIAL_CHARS
);
$search_url = filter_input(
    INPUT_GET,
    'search',
    FILTER_SANITIZE_ENCODED
);

echo "You have searched for $search_html.\n";
echo "<a href='?search=$search_url'>Search again.</a>";
  • filter_var and filter_var_array

It works in the same way as filter_input, just that now you can use the filters on variables/string… these examples are also from the documentation.

// will validate
var_dump(filter_var(
    'bob@example.com',
    FILTER_VALIDATE_EMAIL
));

// will fail (return false), because it
// misses the scheme (http://)
var_dump(filter_var(
    'example.com',
    FILTER_VALIDATE_URL,
    FILTER_FLAG_SCHEME_REQUIRED
));


Should I use it?

Well, can’t say for sure. The validation filters seem pretty good unless you count the URL one… actually that could be a good filter also if you shouldn’t add 3 flags to behave like you normally would expect…

The other reason why I can’t pronounce a final answer (maybe someone who reads this will) because I haven’t checked the source code of the extension…

You have more information about it (not quite that big of a difference) from the online documentation page found here. Waiting your opinion on this one…

Update: Chuck Norrises was here and updated the text, removing my opinion of vulnerable code. eof!



Leave a Reply