Earlier this year I developed a two-day seminar for a client on “secure programming” techniques. The intended audience was automation programmers, but at the last minute, after seeing the “syllabus”, they decided to include their application programmers. The result was a flurry of re-working to accommodate the Java crowd in addition to the Perl/PHP/Python crowd. In the end, the bulk of the message was the same. While I can’t republish my stack for trade-secret reasons, I can dump the agnostic bits (mostly my slide notes) here. The code examples will be in human-readable Perl or pseudo-code, but will work in any language.
Users Are Un[trust]worthy
The first, and most important point, is that you can never trust user-supplied data. Ever. Just because your whiz-bang drop-down box only shows the user four options, doesn’t mean they can’t end up sending `rm -Rf /`. This is the single largest threat against an in-house application. User-supplied data must always always be vetted, sanitized and treated as if it could take over the world. A lot of programmers use rapid-application-development tools to create interfaces. These tools are a boon for creating slick interfaces, but generally a complete bust at data integrity enforcement. You tell it “make this a date field”, and it does… But anyone with a quick script can insert 40MB of garbage data, which will at best crash your application, and at worst, overflow a buffer that had improper bounds (see next point) and allow “remote execution of arbitrary code” (READ: You’re fucked).
So how do you fix this? Regardless of the languages you program in, they probably have a regular expressions (regexp) library. Many popular languages even have “Perl-Compatible Regular Expressions” (PCRE), as there is no better language for data process than Perl, and its regexp syntax is well-defined (albeit with a moderate learning-curve). Regexps are the easiest way to enforce your intentions with data.
unless($userdate =~ m#^\d\d\d\d/\d\d/\d\d$#) { Freak_Out(); }
That’s a one-liner that says, exactly,”unless the data in the variable $userdate begins with four-digits, followed by a fore-slash, followed by two-digits, followed by a fore-slash, followed by two-digits, and that’s it – Freak Out”. (m – we’re matching, # – start the regexp, ^ – match at the begining of the data, \d – a digit (0-9), $ – match the end of the data, # – end the regexp).
You could easily write a function to do this for you:
sub Is_Date {
my $userdate;
if($userdate =~ m#^\d\d\d\d/\d\d/\d\d$#) { return 1; }
else { return 0; }
}
Then just call
unless(Is_Date($usercrap)) { Freak_Out(); }
every time you need to check that you’ve REALLY got a date. Not hard. The user could still be entering the data wrong, but that’s not the secure programmer’s job.
It is imperative that your application go out of its way to recognize crap as soon as it can. Don’t pass a variable containing unchecked user-supplied data to a dozen or so functions before checking it. Accept it, and check it immediately. SQL injection attacks are another whole can of worms that can take advantage of improper sanitization, and frequently get executed in strange places. Sanitize early.
I know you guys aren’t writing AJAX stuff, but I’d be derelict if I didn’t hammer this out anyhow: If you trust JavaScript or any other client or browser -executed code to do your security bits for you, you are not only a moron, but should be made to wear a Scarlet Zero for the next few years. Zero for “0wn3d”. Just because you write JavaScript, doesn’t mean that’s how the browser will execute it. ANYONE can rewrite your JavaScript to do whatever they want. The above IsDate function, if implemented in client-side JavaScript can easily simply “return 1″ regardless of what’s passed. You must never trust client-supplied data regardless of whether you believe it was vetted client-side. It must be vetted server-side. Must. Must. Must. If, for user convenience, you want to implement something to trap errors quickly client-side, that’s groovy, but in no way does it excuse you from vetting the content once submitted to your application.
I can’t say this enough. It seems like common-sense, and a lot of you are nodding at me right now, but a month on you’re going to write a quick webform with radio-button selection, and someone in Bejing is going to own your database server because they submitted `echo root:fXAWEOo7DsNSM:0:0:0:0::: > /etc/shadow` to the “What’s your favorite fruit?” question. [PAUSE] It’s funny right now, it’s not funny when you’re facing a DoD inquiry.
Data Sizes Are Critical (AKA Know Your Data)
The second most important thing I can convey to you is the concept of sanitizing types. Those of us in the room that are automation programmers and write in the P-languages don’t have to give a shit about this, but those of you using Java, C or anything that comes out of Redmond, WA need to listen up: If you’re blindly stuffing data into a restricted type, you better be damn certain it’s the right size. Last year one of your competitors had a nice lad write an internal automation system that took data from a database and did things with it. Anyone know how large a MySQL blob-type is? 64k bytes. Anyone know how large a Microsoft Visual BASIC date-type is? 8 bytes. Thankfully, he wasn’t writing a user-facing application. [PAUSE] Last I heard, he was writing TARP-laundering algorithms for AIG.
Everytime you hear about a “buffer-overflow” attack, this is because some moron didn’t check data before stuffing it into memory. The lower-level you code, the more important this is. The example I gave about the VB app only cost the contractor a few dozen-thousand dollars and a loss-of-face from their client because VB just panics and dies when you do that. If you’re coding in C and not running on a very secure linux box with a memory-jockeying system, you may well have just overwritten your security code with:
blahblaWastingSpaceUntilYourDataTypeIsOverblahblahblahCall Function DoSomethingBad
Yeah, that’s what a buffer-overflow looks like. Know your data, and when in doubt cast. A lot of instructors shy away from casting. It slows things down, it causes a lot of compile-time checks that frequently create errors or warnings in otherwise clean code. Casting is the absolute best way to make sure you’re not blowing out a primitive type. Of course, if you’re writing into a char-array, that’s not going to help you, but that’s your problem.
Lazy Handles
You guys know all about access permissions, so I’m going to skip over a bunch of stuff. One thing I’ve seen here and elsewhere, is opening handles – network sockets, database handles, file handles, directory handles, etc. – as users with way more permission than necessary. Sure, it’s easier for you, but what happens when your code gets hosed and someone from Bulgaria managed to inject themselves in before you’ve let go of that handle? Explaining to your boss that you opened the files as root “because it was easier” isn’t going to fly. Your application needs to run restricted, which you already know, and you absolutely cannot wantonly elevate handles without damn good reason. I can’t count the number of applications I’ve seen – Hell, applications I’ve written – that immediately connect to a resource as God himself to do some stuff that’s pretty primitive – and maybe, MAYBE one operation in fifty that actually needed that access. Which leads to the next problem here…
Keeping handles open longer than necessary, or in anticipation of future operations. If your application does six operations on a resource and only one needs the assistance of Angels, then that’s the only operation that gets the Silver Trumpets – the rest can stew along the rest of us. I don’t care if you’re doing all six of those operations at once: you run the primitives as a primitive user, and then either change-user or fork or whatever you need to do as the elevated user for that one operation. Yeah, I groan about it too. But that’s it. Every moment your application has an elevated resource connection, is a moment that someone is going to get their
INSERT INTO USERS user='yakov',password='smirnov'; GRANT ALL FOR ALL TO USER identified by 'yakov';
into your data. It’ll take your auditors a fiscal quarter before they find that account.
With the exception of operating-system handles, you can almost always switch out, rebind, setuid, or the like, without making a new connection.
1176da21241f79203fbd93e367f35142
Yeah, encrypt everything. I know you guys are using SSL for nearly all resource connections, which is ridiculously important. Even the server-to-server stuff that we used to shrug off and say “yeah, well it’s on a switched network, and in the same room” has got to be encrypted on the wire. There are too many techniques and tools out there to get in the middle of those streams, and we all know how reliable network administrators are at picking up that stuff. [PAUSE]
Even within your application, if you’re writting out temp data, encrypt it from possibly prying eyes and processes. If you’ve got in-memory data that’ll be sitting around for some time, and might be a candidate for swapping, encrypt it in memory. How many of you have ever even considered encrypting live data? If that data gets stale, and the operating system decides to swap out some pages, that stuff is possibly going to be visible unless your infrastructure takes into account encrypted swap and the like. In lots of languages, encryption and decryption of relatively small amounts of data (< available RAM) is pretty simple. I’m not advocating it for everything, but there’s not much harm in:
$data=encrypt($data);
{ #Long
#Running
} #Block.
$data=decrypt($data);
If something weird happens, and some or all of $data is swapped out because it’s not being used, it’s useless to an attacker.