This article was originally published in the December 2015 edition of the php[architect] magazine.
There's a principle that's key to protecting any web application out there, regardless of the language that it's implemented in. It's a basic mantra that, when applied on a regular basis, can help protect you and your users: Filter Input, Escape Output (or FIEO). The basic idea behind this is that any data coming in to the application, really regardless of if it comes from a "user" or some other source, should be considered input and handled correctly. Conversely, the output of this data should be filtered so that you protect the app from accidental vulnerabilities, things like cross-site scripting (XSS) or other content injection.
While these two steps are good things to abide by, I'd like to suggest another step in the process that can help protect your system even more: input validation. Now, some would say that "validation" and "filtering" might follow the same path, but I think this is a bit of a misnomer when it comes to securing your application. Here's how I see the difference:
When you think of "filtering" the most common perspective is that you're taking the data provided and removing the "bad parts" that could cause problems (like quotes to prevent SQL injection or HTML tags to prevent XSS). While this can be a good tactic, I'll tell you why it could be a flawed one later on if not handled correctly.
"Validation" on the other hand is more about ensuring the data that you're being provided is correct. It's not trying to alter that information in any way like "filtering" could be doing. Instead it's expecting data in a certain format, data type, etc from the user and giving it a "pass" or "fail" if it doesn't meet the criteria.
You'll see many articles and tutorials mixing the two of these and, in fact, PHP itself does that to some extent with the filter_var
functionality. It provides not only validation filters but also sanitization filters to meddle with the data it's been given. It's a bad habit to try to mix the two really. I almost see one being more important than the other. Can you guess which one? If you guessed filtering I'll give you one more shot.
Validation should be seen as the first line of defense in your application against bad user input. Remember, just because you provide a form for your users doesn't mean that an attacking script is going to use it. HTTP requests are just plain text and can easily be sent by any kind of HTTP client on the other end. This means that any kind of validation you might be doing on the client-side (Javascript) is mostly useless in actually securing your application. That functionality should be considered more as icing on a security cake - there to make it look pretty and appealing but not adding much overall.
With good validation you're protecting your application from harmful input that could be coming from any outside source, not just from form submissions or REST requests. Validation should even be applied on data that's coming from external sources, like other APIs, or possibly even your own sources depending on how much your trust the data they hold.
When it comes to techniques in validation handling, I'm a big fan of the "Fail Fast" methodology. This is a principle that applies to many different situations, but I want to take a little bit to show how it relates here. By following the idea of the "fail fast" mentality your application would kick back an error (or errors) when there's a problem before taking any other action. Let me illustrate with a simple example. Say you're allowing a user to sign up for your site and put in a username, password and email address for their profile. Following the "fail fast" process, you'd perform the following actions in the following order:
...and so on. The very last step in the process is actually making the new user instance and saving it. For each of the above steps, if the validation failed you'd want to kick back a message to the user before proceeding any further. This allows the validation process to fall through "gates" to ensure it comes out correct on the other side. Obviously some of these checks may vary depending on what your business requirements are but this gives you an idea of the breakpoints that might be related to this simple action.
The other alternative here is to opt for a "Fail at End" process where errors are gathered during the validation and reported back at the end. While this may feel more user friendly, this can also lead to potential issues with validation checks failing in one instance and bad data leaking through to the other checks (which may pass).
So I've talked about some of the basics of input validation and strategies for implementing it in your application. I also want to finish out looking at two best practices around validation:
When you receive data from a user, the first thing you want to be sure of is that it's the correct format for what you're expecting. For example, if you're expecting the user to input a US social security number there's only so many things you'd need to check like the overall length and if it's all numeric. This is a case where filtering could be mixed in with the validation.
Simple formats are relatively easy to implement but what happens when things get a bit more complex? Unfortunately, you're either stuck with one of two things: multiple lines of string manipulation statements or a tool that's both loved and hated by developers everywhere: regular expressions. A good regular expression can make a more complex validation boil down to a single line and keep things tidy. Unfortunately they also have a bad habit of getting out of hand if not kept under control. If you find yourself using a regex in your validation, be especially mindful not to try to do too many things in one place. Complex regex patterns can be difficult to understand and could introduce loopholes into your validation if used incorrectly.
In the "fail fast" methodology I mentioned previously, I advised to return an error to the user as quickly as possible when the validation on the data fails. With more modern applications there's an easy way to give the user feedback without them even having to submit the form: inline validation techniques. You'll see this kind of handling on many of the larger sites out there. They'll provide you feedback on the value you enter either as you're typing or once you switch the field focus. By adding in this layer of more visual validation you not only save users the frustration of multiple page loads just to get the data correct but you also give that instant feedback on where an error might be.
Of course, as mentioned before, Javascript validation like this is just a nicety. The frontend validation should always be doubled up by backend validation, just to ensure things are kept correct even if the form and its validation aren't used.
Hopefully this all has given you a good place to start when thinking about and implementing effective validation in your own applications. Validation is a complex subject to tackle in just one sitting, especially when you think about how many different kinds of input there could possibly be. By following the principles I've laid out here, though, can help you get on the path to good validation and protecting your application even more effectively.
With over 12 years of experience in development and a focus on application security Chris is on a quest to bring his knowledge to the masses, making application security accessible to everyone. He also is an avodcate for security in the PHP community and provides application security training and consulting services.