Posterous
Daniele is using Posterous to post everything online. Shouldn't you?
Unknown35
 

Daniel’strae

Is cuma cá mhinice a théann tú ar strae; is é is tábhachtaí gurb áil leat do bhealach a aimsiú arís.

How can I allow my user to insert HTML code, without risks? (not only technical risks)

vote up 2 vote down
star
1

Hi guys.

I developed a web application, that permits my users to manage some aspects of a web site dynamically (yes, some kind of cms) in LAMP environment (debian, apache, php, mysql)

Well, for example, they create a news in their private area on my server, then this is published on their website via a cURL request (or by ajax).

The news is created with an WYSIWYG editor (fck at moment, probably tinyMCE in the next future).

So, i can't disallow the html tags, but how can i be safe? What kind of tags i MUST delete (javascripts?)? That in meaning to be server-safe.. but how to be 'legally' safe? If an user use my application to make xss, can i be have some legal troubles?

bdukes
7,249628
asked Mar 31 at 15:26
DaNieL
4189

7 Answers

vote up 6 vote down
check

If you are using php, an excellent solution is to use HTMLPurifier. It has many options to filter out bad stuff, and as a side effect, guarantees well formed html output. I use it to view spam which can be a hostile environment.

answered Mar 31 at 15:40
DGM
775111


I decided to take this way, plus some kind of personal steps. I must give the total freedom to my costumers to use html tags ('cos of the WYSIWYG editor), restricting only certain things.. i hope that keep it updated with the latest security doors wont be much problematic. – DaNieL Apr 1 at 7:40 
 
 
I trust it much more that I trust my own efforts.... – DGM Apr 1 at 17:09
add comment

vote up 6 vote down
check

The general best strategy here is to whitelist specific tags and attributes that you deem safe, and escape/remove everything else. For example, a sensible whitelist might be <p>, <ul>, <ol>, <li>, <strong>, <em>, <pre>, <code>, <blockquote>, <cite>. Alternatively, consider human-friendly markup like Textile or Markdown that can be easily converted into safe HTML.

answered Mar 31 at 15:31
John Feminella
10.5k2547

 
 
Can´t you still insert scripts in the allowed tags using a white-list? – jeroen Mar 31 at 15:38
 
 
That depends on how you're escaping them. If you're describing something like "<scr<script>ipt ...", I'd first note that "<scr" looks like the beginning of a tag. Since "scr" isn't whitelisted, we can escape it safely. Then we get to the "<script>" and it's also escaped/removed. – John Feminella Mar 31 at 15:45
 
 
I was thinking more about the attributes, but I guess that depends if your white-list has any tags that need them, so you would have to allow them. If you allow attributes, you´d have to get rid of the whole onclick="", etc. range, but I guess that´s pretty obvious :) – jeroen Mar 31 at 15:54
 
 
Oh, absolutely. You have to whitelist attributes separately, though, just like you whitelist each tag. (That's the price you pay for being explicit.) – John Feminella Mar 31 at 16:18
add comment

vote up 4 vote down
check

It doesn't really matter what you're looking to remove, someone will always find a way to get around it. As a reference take a look at this XSS Cheat Sheet.

As an example, how are you ever going to remove this valid XSS attack:

<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>

Your best option is only allow a subset of acceptable tags and remove anything else. This practice is know as White Listing and is the best method for preventing XSS (besides disallowing HTML.)

Also use the cheat sheet in your testing; fire as much as you can at your website and try to find some ways to perform XSS.

answered Mar 31 at 15:32
LFSR Consulting
5,2851827

vote up 2 vote down
check

Rather than allow HTML, you should have some other markup that can be converted to HTML. Trying to strip out rogue HTML from user input is nearly impossible, for example

<scr<script>ipt etc="...">

Removing from this will leave

<script etc="...">

answered Mar 31 at 15:31
ck
5,689219

 
 
Using a white list rather than a black list would solve this problem. – Gumbo Mar 31 at 15:37
 
 
see the img tag answer in stackoverflow.com/questions/701580/… – ck Mar 31 at 15:44
 
 
XSS is also possible through other markup languages, such as BBcode, so that doesn't really fix anything. A whitelist approach works pretty well. – troelskn Mar 31 at 16:17
add comment

vote up 2 vote down
check

For a C# example of white list approach, which stackoverflow uses, you can look at this page.

answered Mar 31 at 15:42
cagdas
1,0828

From StackOverflow.com

Loading mentions Retweet
Filed under  //   Html   input-satinization   PHP  
Posted June 5, 2009
// 0 Comments