Securing Legacy Applications - Part 2

This article was originally published in the April 2016 edition of the php[architect] magazine in the "Security Corner" column.

In my previous column I talked about legacy applications and provided some general "quick hit" tips to help you secure your application. I'm going to continue the same theme this month and offer a few more helpful hints you can use immediately. You'll notice as I go through these points that they all have something in common - they're relatively generic. Legacy applications are a whole different beast when it comes to the mish-mash of technologies, development practices and naming conventions. Unfortunately this also means there's no way to really effectively secure it without a good look at how it's doing things.

Consider this a word of warning that your mileage may vary when it comes to applying these tips. Don't expect that, if you treat these like a checklist, your application will be magically secure. After all, most of the security issues (probably 9 times out of 10) come from the integration of functionality and tools, not the tools themselves.

Using unfiltered data

If you've done any reading about application security, you'll see one theme mentioned over and over: "don't trust the user". What this basically boils down to for you the developer is that any of the input you get for your application to use should be considered tainted. I briefly touched on this in the section about ``$_REQUEST` in last month's article, but I want to expand on it a bit further here and offer suggestions on how you can avoid using potentially malicious data. There's two key things to think about when working with input: validation and filtering.

As a side note, there's a difference between "filtering" and "escaping" though many will use those terms interchangeably. Filtering is the act of taking the data passed in and removing malicious/bad data from it before use while escaping is taking the data provided and removing the malicious content prior to output.

The basic definition of "validation" is ensuring that the data you've been given is what you're expecting. If you're looking for a US phone number, you definitely want the three-three-four format (or just 10 digits without country code). Same with things like social security numbers or credit card numbers - you know what they're supposed to look like so you can verify that structure. That's all validation is - verifying the structure and type of the incoming data. Some, like the numbers I mentioned before, are easier to judge but others, like free-form text, are a bit more difficult. In that case you have to determine what kind of text you might allow (hint: usually denying HTML content is a good place to start).

Here's some examples of validation:

- Structured data (like phone numbers or credit card numbers)
- URLs
- Email addresses
- Required

Required? Yep, that's technically a kind of validation too. Most people think of it as something separate but it should be treated the same as any other rule. While there are solutions for validation built in to PHP (like filter_var) I recommend pulling in a good validation library that lets you make more complex validations. There's a great one from Particle-PHP project, the Particle\Validator library, that "makes validation fun" with an easy-to-use fluent interface. So, for example, say we wanted to verify that the string the user gave us exists, is a certain length and is only alpha-numeric characters. While we could use PHP itself to do these kinds of checks (isset, strlen and a regular expression), the library makes it a one line affair:

$validator->required('input')->lengthBetween(2, 10)->alnum();

By running that through the validator and providing it with the data to check, you can easily check for a match. Some frameworks will also come with their own validators. Laravel, for example, makes a validator available in its controllers for easy use. You can just pass the request itself in to it and define the rules as an array:

$this->validate($request, [
    'email_address' => 'required|email'

There's plenty of options out there but for legacy applications I highly recommend pulling in a library and creating validation that way. You could even introduce it at a model layer (you've refactored to use models, right?) and run a validate() method to make the logic reusable.

Protecting forms with CSRF tokens

Another relatively easy security feature to introduce into your legacy application is the idea of Cross-Site Request Forgery (CSRF) tokens. These are randomly generated tokens that in effect prove that the data being submitted did come from your site. Here's how they work:

  1. When a user comes to your site and pulls up a web page with a form, the PHP generates a random hash
  2. This hash is then stored in the current session and output as a hidden field on the form
  3. When the form is submitted the two values are then compared
  4. If there's a match, you're good to go. If not then it's a potential security issue and the data should be ignored.

Usually the form field will look something like:

<input type="hidden" name="_csrf_token" value="3441df0babc2a2dda551d7cd39fb235bc4e09cd1e4556bf261bb49188f548348"/>

And the code to generate the hash is relatively easy too:

$hash = hash('sha256', openssl_random_pseudo_bytes(256));

What problem are we trying to solve here? Well, because of the way the web works, any site can make a request to any other site using one of the HTTP methods (like POST, GET, PUT, etc). This means that a completely different site could be sending a POST request to yours with whatever data they'd like. By introducing these tokens you can outright dismiss these outside submissions and drop the data they've given.

There's a few tricks to using these tokens to keep in mind too:

  1. You'll need to be sure and use a solution that tracks more than one token at once. If you're only tracking one token, in this day and age of multiple tabs, the user could open another page and the token on that page would be the one in the user session. One option to this is to use the two-field solution that the CSRF middleware for Slim v3 uses: one field for the token name, the other for the token hash - both randomized.
  2. When making the hash itself, be sure to pull from fully random data (like the random_bytes or openssl_random_pseudo_bytes methods) and hash it with at least a SHA-256 hash. Tokens don't need to be cryptographically strong, they just need to be random and unpredictable.
  3. Do not - I repeat - do not reuse tokens. There are some implementations that will have you hard-code tokens and reuse those. The key to the CSRF hashing is that it's random. In fact, it's a good idea to randomize it for every page load and every form.

These two suggestions are a bit more complex than the ones in the previous article but they're still things that are completely within reach when refactoring for security in your legacy application. While the CSRF tokens are a bit more of a "quick hit" the validation will probably take longer to get right, especially if your software is currently accepting a wide range of potential data and not just simple form posts. Keep in mind, though, that one of our main goals is to simplify. These suggestions are relatively simple but could get overwhelming pretty quickly if you're not careful. As the famous author Ralph Waldo Emerson once said "Nothing is more simple than greatness; indeed, to be simple is to be great."

by Chris Cornutt

With over 12 years of PHP experiece and a focus on application security Chris is on a quest to bring his knowledge to the masses, making application security accessible to everyone. He also provides application security training and consulting services.