Layers pattern for security

Pay attention, web developers! This is where too many of you screw up. Way too often a few characters behave strange and a quick fix is made, in stead of taking a step back to look at the real problem.

This is fine for the beginning amateur, but professionals should always get this right.

What happens when you don’t do this

If you are very lucky, you will have a lot of extra work and more complicated code. If you are less lucky strange things will happen on your website. Maybe “2<3″ is displayed as “23″ or “don’t” is displayed as “don\’t”. Maybe it’s even saved like that in the database, which just makes it much harder to fix.

But worst case is a gaping security hole. Do one little thing wrong, and you can get serious security vulnerabilities like XSS and SQL injection.

What is layers

Any non-static website has layers. It’s just a question of how many, and if you are using them correctly. To make a simple blog you need at least three:

  1. The presentation (HTML, CSS etc.)
  2. The application (maybe written in PHP)
  3. A database (like MySQL)

A layer should only know about what is above it, and what is below it. In this extremely simple scenario, the HTML knows about the PHP (filenames for the links etc.), but should never know about the database (table names and certainly never ever login information). And SQL statements (or even snippets) should NEVER get more than one layer away from database layers.

Usually there is many more layers in the application part to simplify things. Maybe there is a layer for translating pretty urls to actual PHP filenames. And there should be a “database abstraction layer”, which makes database access look the same no matter which database you use.

En example of how even abstraction layers that hardly adds any features are useful: It adds flexibility. With a good database abstraction layer, you can change the entire database (maybe from MySQL to PostGreSQL) without changing your actual application. You just edit the abstraction layer, which sits between the application and the database. The advantages becomes clearer further down.

In this article I will only talk about a few layers, enough for most simple web-projects. But the idea is the same with any amount of layers. I will use:

  1. The presentation (HTML, CSS etc.)
  2. The application (maybe written in PHP)
  3. Database Abstraction Layer
  4. A database (like MySQL)

When the user performs an action (like a search), data from the user (for example search terms) travels through the layers, probably all the way to the bottom, and then results (for example search results) travels all the way back up, possibly including the original data.

Here comes the important part:

When data moves from one layer to the other, make sure all necessary conversions are done correctly, and preferrably automatically.

When data moves from the browser to your application, you will probably get it as normal plaintext. So far so good.

Then you transfer the data to the database. The database interpretes certain characters specially, like percent, underscore and apostrophe. The amateur is tempted to fix this with search-and-replace, and will often do this the wrong place so that the fix ends up in places outside the database.

This is a task for the database abstraction layer. My favorite way is parameterized statements. A simple example:

result = db.getAll("SELECT id, name FROM students WHERE name=?", name);

The questionmarks gets replaced with the parameters after the query.

The variable db containts an object from the database abstraction layer. Which you may have made yourself, maybe one provided by the system you use (PHP has several), or as I prefer – a simply one build on top of something good someone else has made.

The results from the database should be in plaintext. No special handling of any characters, special to SQL, HTML or any other technology. Any layer should only know about the layer above and below, and the HTML and browser is far above the dabase and database abstraction layer(s).

Now you have the input and the result from the database, and it’s time to display them. But data is sent to the browser in HTML format, not plaintext. So transferring data from the application to the presentation layer, means translating from plaintext to HTML.

There are tons of ways to do that, depending on the framwork you are using. Here’s a very primitive way of doing it in PHP:

<p>
Search terms: <?=toHTML($terms)?><br />
Results: <?=$result_count?><br /><!-- No need to escape, this is numeric -->
</p> 

In this primitive example, you need to remember “toHTML()” every time you print out text, which is a weak spot. In some systems it takes extra code to NOT encode it in HTML.

The toHTML()-function will replace < with &lt; etc. Maybe it inserts <br /> at the end of lines, but then you should check that the search terms doesn’t have more than one line. Maybe you should check for that anyway. If it’s a comment for all users to see, maybe you want some extra checks here. like collapsing 500 newlines in a to two. Maybe like this: <%=toHTML(sanitize($terms))%>

But why not sanitize it while getting it from the browser, so we get sane data? Well, it is a good time to check for very bad input, like binary data. But I like to have the original input in the database. That is always a good thing, for example if you change something in how you display your data. Maybe you want to change how you handle newline characters. Maybe you have improved your sanitizing method.
With this method you just change your code, and you’re done. If you change the data before you put it in your database, you will need to change your old data. This can be very complicated, sometimes impossible. (For example if a bug removes data.)

Summary

  • Know that you have layers, and which ones
  • Insert abstraction layers when they add value
  • A layer can only know about the layer directly above or below
  • When data moves from one layer to another, convert it accordingly
  • Do the conversion in a common place so you only have the code once, and maybe even in a way so you don’t need to remember to do it every time

Doing this the right way makes simple code, pretty results and solves most security issues.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>