Category: Architecture

Feb 24 2010

Much Ado About Nothing

Numerous sources are quoting a NetSec article, that quotes other articles that quote others claiming we (the U.S.) would lose a cyberwar. This isn’t anything new. Any non-third-world country with any decent assessment arm would tell their leaders the same thing for a simple reason: Cyberwarfare, like nuclear warfare, is all about first-strike. There are retaliation-based deterrents that we hope will dissuade anyone from initiating either against the U.S., but at the end of the day, someone won’t care and press the proverbial Big Red Button.

At an invited talk last year I posed a rhetorical question to the audience of mostly 45-60 year old white men: What would your children, or grandchildren, do if they couldn’t “get on-line” for a day? A month? A year? The horror! It is in fact the youth that would be most impacted, in fact debilitated, by a fruitful, prolonged cybersiege, because we are doing a piss-poor job of educating them around technology. I’ve lamented a lot about our trending away from knowledge and instead on acquisition: Who needs to know things when we can ask the Googles? Finally, it seems, people are catching on.

But for the rest of the world, and anyone whose business has a non-virtual existence, this isn’t something you need to worry about- and especially not something Congress should pollute. “Shutting down the Internet” is something they’ve wanted to have the power to do since the 1980’s, but it’s folly. Disconnecting isn’t particularly effective and is trivial. Any Internet-connected network can sever its Internet lifeline in moments, isolating themselves from the Greater Network. Even large national or global networks can sever peering connections or impacted segments very quickly- We’ve seen this with handling of the Great Worms of the 1990’s. The U.S. Government doesn’t need, and shouldn’t have, a magic cleaver to do it for us.

To end with another quote from the aforementioned talk: Regardless of policy or preparedness, the U.S. will suffer a massive and unprecedented cyberattack within the next five years, impacting up to 1/3 of American networks. I just hope it’s not my ‘third.

Jan 18 2010

Durable Programming

In systems programming (and application programming, and life in general) there are two ways to deal with potential problems: You can try to avoid them, or you can handle them intelligently. When we walk across the street, we look both ways: This is a simple avoidance. It’s easier to look both ways and not have to intelligently handle the oncoming car. We will, hopefully, however, intelligently handle the oncoming car in the event our avoidance strategy fails. We will jump out of the way, or throw a brick at the windshield, whatever. It continues to boggle my mind the lengths people will go to to avoid the most random corner-cases, but fail to intelligently handle even the most obvious of exceptions, even ones that continuously rear their heads.

Avoidance scenarios in code are often completely unnecessary and several orders more complex than handling the inevitable exception to begin with. The simplest solution is almost always the best, and always the best when you can’t rely on your avoidance strategy to begin with! If your program calls some code, and expects to get back a 0 or 1, what happens when it gets a 2? What happens when some field in a database you flagged as never null (an avoidance) is null for some records?

Avoidance is about assumptions. A lot of programmers love assumptions, they code for them often, sometimes even writing comments in their code “the blahblah field is set to ‘never null’ so we can assume that to be true” – Really? Until an admin turns off constraints and bulk loads some “bad” data. Instead of writing the cheeky comment making the assumption, you could have written a one-line handler to Do The Right Thing. If you toss what you think you know out the window, program with the facts in mind, and handle failure cases elegantly, you end up with a durable system.

Your system application sends a file from one system to another every day. Usually it works, sometimes it doesn’t. When it doesn’t, what happens? Nothing, because you never coded for that eventuality. Why would you? It’s not your fault if the other side doesn’t work, after all. Your code is perfect. The intelligent handling to this is two lines of code: after you send the file, check to make sure the file is actually on the other end, and retry if it’s not (or send you an e-mail, or write something to a log… SOMETHING). I can list over 60 different reasons that file may not be on the other side- you can add mitigators for those 60 reasons (plus the other 60 I didn’t bother thinking of), or you can intelligently handle the problem with two lines of code, actually, zero lines if you’re elegant.

SendFile(fileName,fromHere,toThere)

could be replaced with:

until(FileExists(“toThere/fileName”)) { SendFile(fileName,fromHere,toThere); }

That’s pseudo-code (‘until( .. )’ is the same as ‘while(not .. )’, if you’re using a primitive language) but it will work semantically the same in at least 16 different programming languages. This doesn’t mean you shouldn’t try to figure out why it’s failing and perhaps fix something that’s broken along the way, but your code doesn’t need that complexity, it just needs to Do The Right Thing.

Another classic example is how one handles multiple instances. Some software runs fine with multiple copies (like your web browser) some behave badly (like your e-mail client), some even worse. Frequently, even when they know that running two or more instances at the same time is very very bad  (will-destroy-data bad), they don’t handle that event, they avoid it. They say “well the code runs via a scheduler, and there’s enough time in-between runs that it should be done”. Should. You may destroy data and cause more work, for a “should”. The handler is a 2-line fix:

open ME, “<$0″ or exit;
flock ME, LOCK_EX | LOCK_NB or exit;

Lock yourself, and exit if you can’t get a lock. If locking yourself seems too conceptually scary, then pick a lock file (that’s what the /var/lock subsystem is for, by the way) and lock on that. The code is Perl, but the concept will work in at least a dozen different languages, and is bullet-proof (assuming your host OS supports flock).

There are a lot of reasons people don’t write durable programs- I don’t pretend to do it all of the time, either. Laziness and ignorance are probably tied for first, followed closely by apathy.  If your system is “critical” (to you, or someone else) and if/when it doesn’t function it generates work, there is almost always a 1-to-2 line solution to help it survive, or at the very least, not do the wrong thing.

Jan 12 2010

Endocrys Limited Alpha

Last week a couple colleagues in different environs offered to work with my latest-stable code in test environments, providing me volumes of great feedback. Over the last few days I’ve been enthusiastically implementing large swaths of those thoughts, taking 1/2 days from my dayjob to get things going. I wanted to publicly share the extra documentation I gave them, as it provides good insight into the changes in Endocrys, as well as where it is going:

Read more »

Oct 30 2009

More About Endocrys

I previously mentioned that I’ve re-acquired rights to Endocrys, and that I was excited about it. My copious free time has been spent, of late, ripping it apart and making it cleaner and applying the lessons learned over 7 years of maintaining a sizable (458 system (peak)) Endocrys network.

Endocrys has two primary modular components: Autocrys and Paracrys.

Autocrys is an extensible communication protocol atop XMPP. It governs the syntax of commands or queries sent to systems or groups, the responses of systems to those queries, how to manage their presence, and how to react to presence changes in others.

Paracrys is a database-driven deployment and configuration system. Paracrys allows module code and configuration data to be stored centrally and deployed to Endocrys nodes on-demand. Paracrys fully supports versioning, thus allowing changes to be rolled-back in the case of a major oopsie. How small can a Paracrys module be? Here’s an example that implements a command called ’shell’ that allows you to do, essentially, whatever you want on an Endocrys client:

BEGIN { $Endo::MODS{SHELL}++; $Endo::CMDS{SHELL} = \&shell; }
END { delete $Endo::MODS{SHELL}; delete $Endo::CMDS{SHELL}; }

sub shell {
 return `@_`;
}

Drop that puppy into the Paracrys MODULES table with some other data, issue a mass “fetch module SHELL; refresh;” command, and bingo, all of your systems now let you do very bad things. It’s that easy to create a command to do something… Hopefully something useful.

Of course you should note that there is no access control in the above code… How do we prevent Bad People from using our horrendously very bad shell command? That used to be managed by the Communication Masters using another database called EndoACL, but has been folded into Paracrys’ duties and drastically simplified. Each Endocrys client, when receiving the shell command, will now ask Paracrys if the user who sent it is authorized to issue that command. Previously, the clients never even received commands from users not authorized to send them, at great expense.

One of the major goals of the project originally was to have absolutely minimal dependencies on third-party code, so I reinvented the wheel in numerous places. Now that it’s mine again, those requirements are vapor and I’m ripping out large swaths of my code, and exchanging it for API calls into other code that is the de facto standard to do whatever. For example, I wrote a function that copies a file from one location to another. Ew. The File::Copy module is the Perl Way to do that, so that’s how we do it now. Less code I have to maintain, and less code you have to read to understand Endocrys.

Another major goal of the original project was absolute redundancy on all levels. With a requirement like that, I over-engineered what were called the Communication Masters (CMs) so that they heart-beated each other, transferred each other’s sessions, held elections to decide who was authoritative for which IP ranges, dealt with segmentation and partitioning, etc. All of this at the cost of highly-customized hybrid XMPP/SQL servers that weren’t readily upgradeable. Wednesday night I spent a lot of time diagramming, and tonight solidified the spec to separate the XMPP server from the SQL database, and rely on established high-availability tools like pen or an SLB appliance to ensure connectivity to a farm of XMPP servers if needed. Additionally, this separation has allowed me to use MySQL clusters for the Paracrys bits, which adds scary levels of redundancy to those very critical bits.

Lastly for this post, the entire ithread Endocrys implementation has been ripped out and replaced with EV and AnyEvent, and the Net::XMPP code has been replaced with AnyEvent::XMPP for one cohesive event loop that runs very very fast. Originally I envisioned an Endocrys client maintaining dozens of XMPP sessions while handling dozens of system events and receiving dozens of commands, so I stuck everything in threads, and allowed it to scream along on SMP boxes. While this works just fine, there is a LOT of extra complexity involved with sharing variables across threads, dealing with races, etc. and the benefits are dubious when compared against a good, strong, event-loop system. I’m not quite done yet, but the net loss should be about 30% of the main code modules, with reduced complexity for all sub-modules as well.

I don’t have an ETA as to when the code will be generally available, but I’ve had some pings from some bright people interested in hammering the retooled version in non-critical environments, so hopefully it will be this year.

Oct 27 2009

Introducing Endocrys

Endocrys [en doe kriss] (was Endocryn until a TradeMark popped up) is a distributed, encrypted, modular, real-time, hot-upgradable, self-healing system geared at autonomous communication between distributed systems. It was developed for a client in 2002 and 2003, and they’ve decided to let the 10-year exclusivity lapse early.

This is one of my favorite products, and I’m more than a little excited to get it back and get it out. It’s been battle-tested for many years, and I’m very proud of it. I’m working on getting the code cleaned up and abstracted before releasing it under the GPLv2. Below are edited points from slides describing Endocrys and why you might be interested in it.

The Problems

Dozens… hundreds… of systems, physical and virtual, all going about their business. Then something happens: maybe a disk drive failed, maybe a process died, maybe someone ordered a ‘reboot’ or ‘halt’. Those systems don’t have a way of communicating that externally. There is no “Hey, I’m rebooting, BRB” in the server world.

Dozens… hundreds… of systems, physical and virtual, all going about their business. Then you have a question: How many of them have Western Digital harddrives listed in a recent recall? How many of them are running <2GB of RAM? How many of them are running a certain version of some software listed in a security advisory? There’s no way to ask that question to the farm. There is no “Dear Lazyweb, answer this question for me” in the server world.

The Purpose

At its most basic level, Endocrys is a conduit between all of the systems and you. Think of it like a gigantic Instant Messaging buddy list, where all of your buddies are systems. When they’re online, they are in the list and can set their status messages, send you messages, send each other messages, receive messages, etc. Endocrys leverages the eXtensible Messaging and Presence Protocol (XMPP) to tie this framework into existing clients, transports and APIs, enabling a near-infinite number of possible applications or functions you can deploy.

The Technology

Endocrys is built as a framework – an abstract set of rules that can be extended at any time by writing little modules. These modules can be applied across the Endrocrys network instantly, without any downtime.

By leveraging XMPP, the Endocrys network is highly-redundant with no single fail points. Any number of “Communication Masters” (XMPP servers) are online, but only one is needed to keep communication flowing. All network communication is encrypted and signed. Partitioning and segmentation is handled rationally.

Communication is very similar to Instant Messaging, there is relatively no latency, and XMPP assures delivery even to systems offline when the message was sent.

Monitoring and control systems can participate on Endocrys, automating the remediation of problems remotely and automatically.

The Protocol

XMPP sits atop TCP, and atop that sits the Endocrys Communication Protocol aka Autocrys. ECP is a fully-authenticated, fully-controlled, skeptical protocol that serves both for sending structured announcements as well as sending and processing commands. The entire ECP specification is listed in AUTOCRYS.TXT. Endocrys agents can be written in any programming language, attached to any other framework, at any OSI level, as long as they can speak XMPP and implement ECP appropriately.

Oct 23 2009

Death To Passwords

A close friend forwarded me a note from a relative who was trying to solve a password-management problem. What was going to be a short statement of opinion turned into a moderately-humorous manifesto, and I thought I’d share (lightly edited).

I certainly empathize with your password management situation. Passwords are, actually, horrible security mechanisms and it is my opinion that they should be done away with altogether. Problem solved: No passwords means no password management headaches.

So, how to do prove you’re who you are? How do your systems trust who you say you are? A token. A “key”. A physical and logical item possessed by the user. Something they can lose or get stolen or drop in their coffee mug, but doesn’t matter because it’s useless without them leashed to it- and can be reproduced by authorized personnel in a jiffy.

The security industry likes calling it “two-factor authentication”: The two factors being something you have (the token) and something you know (the sentence uttered by your first girlfriend when she dumped you, song lyrics, the title of a book … whatever). Behind the scenes we shift from password management (gross and abhorrent) to key management (fun and exciting!)

Encrypted-key security is the only managed authentication scheme I have rolled out in client environments for the last 7…8 years. It can be “difficult” to wrench into an existing infrastructure, changing the culture, disrupting the status quo- but technologically is a vastly superior solution to identity management.

The defacto standard is PGP [1], although there are a lot of players in this market with varying quality of products, some aiming at various vertical markets. The link below gives a nice picture of how various systemic pieces tie together.

I know I didn’t answer your question- people tell me that a lot- but I can’t in good faith recommend password management. I haven’t been able to since 1999 or so, and certainly can’t as 2009 winds down. Sure, there are things you can do – the DoD uses the Mandylion [2], which you can buy on ThinkGeek [3] for $50 – but it doesn’t solve the actual problem of secure identity management: Please pardon the crudeness, but it’s like putting whipped-cream on dogshit.

[1] http://www.pgp.com/products/index.html
[2] http://www.mandylionlabs.com/
[3] http://www.thinkgeek.com/gadgets/security/91a2/

Oct 08 2009

Can You Have Too Many Roombas?

I have four Roombas of three different models (I blame Steve for telling me about “deals”). I think I may have too many. Regardless, the one thing they all have in common is a hacked together BlueTooth connection so I can run various software on them remotely. While I haven’t really talked a lot about those “various softwares”, I’m really excited about a project I’m working on now, working title of RooCluster.

RooCluster is a command-and-control application designed for the special needs of  multiple robots operating in the same space, or over large multi-room spaces. Each Roomba is being fitted with an RFID tag, which, in coordiation with some more wireless access points, allows me to triangulate where a Roomba is and its travel vector (sometimes, math is cool). This information can help RooCluster avoid nasty Roomba-on-Roomba collisions, and also presents the possibility of meta-virtual walls.

If you have a Roomba, you probably have a virtual wall – the little pylon that sends out an infrared beam that the Roombas treat just like a wall. With some work, RooCluster should be able to honor coordinate-based lines (which could, in turn, form other shapes) and effectively “wall-off” areas without needing a physical barrier, or a battery-sucking virtual wall. You can also overlay the position and vector data onto floorplans, and see exactly where the Roombas are, and where they’re going.

Of course, you can also use it to make your Roombas dance with each other.

Or joust.

Oct 05 2009

The Next Five Years of Storage

[NOTE: This essay was commissioned by a client in December 2006. It's the second in a series of old-yet-relevant position-papers whose exclusivity has expired, that I'm editing and posting. Things for the next five look "similar". There is no formal "conclusion", as this is one section of a larger piece.]

Over the next five years, gross storage needs will double every other year, sparked by industry trends that avoid deleting anything, ever; continued bloat in software programs; increased user demand for larger-file storage; increased user demand for indefinite storage; increased user, corporate, and industry expectation of system-side backups and frequent snapshots; and the enabling factor of meteoric-disk-size -to- paltry-disk-cost ratios.

Since the late 1990s, we have seen rapid acceleration of infinite data life. While storage vendors will use terms such as “information life-cycle management”, “information archiving” or “data warehousing” – they all converge onto the premise that corporate data life is no longer finite. The value of this is dubious, but irrelevant to argue: financial workers expect to be able to look at historical data for modelling purposes; draft and product workers expect to be able to look at long-dead projects that might now be of value with new knowledge; in the throes of bankruptcy, competent managers (and lawyers) will want to mine the archives for something… anything that may provide some value. Everything your organization has ever known is expected to be retained, indefinitely.

The average 10-page MS Word document in 1995 was 13K in size. The average 10-page MS Word document in 2006 is 1.4MB. While that size may still seem small, it’s indicative of a growing trend of software generating vastly wasteful content because they can. Software vendors don’t need to worry about their data fitting onto floppies anymore, so they don’t. Multiply this across dozens of applications, add in media, and you have truly huge data files with only a few pages of actual content.

Similarly, the users want ever-larger files. Gone are the days of compressing graphics, video and audio to the Nth degree: users want full-quality content. They don’t want a 120×120 “thumbnail” video, they want something that takes some real-estate on their oversized monitor. As bandwidth increases, so will the user-desire for better content faster. They then want to save that same content to their network volume. They want it backed up in case of catastrophe (or their own error). What was a 3MB MP3 file is now a 45MB FLAC or WAV file sitting in your database.

The increase in user-end space (desktop harddisks) has led users to demand not only more and more space from their storage providers, but also indefinite storage. Users no longer have to selectively delete their e-mails to stay in a predefined space, so they keep them all, forever. They expect the same from the rest of their digital attics: they expect every bad poem, doodle, patent-idea-on-a-napkin, picture of their grandkids, etc. to be immediately available, forever.

Forever. Even if your disks die. Even if they accidentally delete them. Even if a meteor pummels your datacenter. The old standard of weekly backups have long passed the borders of Being Prudent, travelled through the Fields of Marginally Acceptable, and have entered the Mountains of Irreparable Harm to Your Reputation. Users, customers, regulators, etc. are barely tolerant of losing a day of data, and this will get worse. In the next half-decade a truly monumental shift into multi-media backups, near-real-time data snapshots, and 100% protection of data assets will be fully realized, requiring several multiples more mixed-media backup storage than live data storage.

On the up-side, disk sizes are sky-rocketing, costs are plummeting and the reliability of the new serial ATA (SATA) architected drives have come up to a level that allows anyone to build in or expand networked disk with a trivial investment. A new generation of storage vendors are coming up and challenging the old way of thinking about networked storage, and adopting technologies with more agility than their behemoth competitors. We’re quickly on our way to 1TB disk drives, flash-based storage continues to be refined and is nearing enterprise-grade, holographic storage is being commercially realized for some applications, and all of these technologies are driving the cost per megabyte down.

Sep 25 2009

The Next Five Years of Bandwidth

[NOTE: This essay was commissioned by a client in December 2006. It's the second in a series of old-yet-relevant position-papers whose exclusivity has expired, that I'm editing and posting. Things for the next five look "similar", yet scaled up in some areas. There is no formal "conclusion", as this is one section of a larger piece.]

Over the next five years, datacenter bandwidth will level off for a bit. With the 10GigE standard behind us we can finally pull our backbones up to a level where they’ll be able to breathe easier for a while. Storage speeds are still being gated by the storage devices themselves, and until either solid-state media becomes cost effective or disks rotate twice as fast as they are now, that isn’t going to change much. Aggregating virtual systems is actually causing an interesting bandwidth phenomena that I’ll address later. Regardless, a 10Gig, or Nx1Gig backbone should be able to breathe well for the next half-decade. Planned year-over-year demand increases of 5-7% should be expected.

Desktop network speeds have been about the same for the last five years, and will largely remain unchanged. A 32-bit computer system running a commercial desktop operating systems has too many architectural limitations, still, to be make use of more than 60-85Mb/s of bandwidth. While some vendors are running 64bit processors, they generally are using bus architectures that aren’t that wide, thus gating peripheral speeds back to 32bit. In the next five years that will clean up a bit, and 64bit “extensions” to the 32bit processors will become more common place, but still not impacting the network noticeably due largely to OS and bus architectural issues.

Environments consolidating onto virtualized systems are seeing an interesting gross decrease in datacenter network bandwidth use. Not surprisingly, they’re also seeing peak utilization well above what they had prior to consolidation. The latter is easily explained by virtualized systems generally “netbooting” their OS from the storage network or a bootserver, and now more than ever embracing networked storage completely. The gross decrease has been unexpected because of the higher demands on the network, but is explained by architectural constraints. We’re now seeing 10-15 virtual servers sharing one or two network connections, where previously each had one or two of their own. This has somewhat of a levelling effect on network use, but isn’t dramatically impacting service performance as one would expect. The network is more important in these environments, but as a whole not as taxed.

It was largely believed that mobile “broadband” availability and use would be much higher by now, but we have yet to see a real platform for use. The Palm Treo series is getting an overhaul “soon” and rumored platforms by Google and Apple may change that landscape. In general, even if fully realized, the network demands by these users will largely have no impact on the greater network, or on datacenter network needs. The next-generation, “4G”, will be changing that, but I don’t expect to see that kind of horsepower in a phone until late-2010-to-2012: the processors are still just too slow.

What will change dramatically will be the bandwidth access for remote users. While not directly impacting the datacenter we’re going to see dramatic growth in the cable/DSL/satellite “broadband” space. Internet-facing applications may see a 20-30% rise in client demands as users become less tolerant of waiting for application loads due to their expectations of “faster” service, on the order of 200-250% more bandwidth. It is expected that OSP asymmetrical provisioning will continue.

Sep 24 2009

Disposable Appliance Computing

[NOTE: This essay was commissioned by a client in February 2007. It's the first in a series of old-yet-relevant position-papers whose exclusivity has expired, that I'm editing and posting]

The hosted systems industry has turned another critical point. Several years ago we eschewed large mainframe systems in exchange for commodity servers that could divide load and work together to provide services without single-vendor lock-in and without a single piece of “iron” waiting to fail. The computing power of a $2,000,000 mainframe was dwarfed by the implementation of $80,000 in commodity hardware. With virtualization coming-of-age- with Intel and AMD putting hooks into their processors and chipsets to allow virtualization to be fully realized and not just a software-only hack- we’ve seen those same commodity systems hosting dozens of virtual systems reliably and at near-metal efficiency. The cost per virtual system is a number rapidly approaching zero.

New offerings from Sun, IBM and HP/Compaq are emphasizing something that “the server guys” haven’t needed to care much about: infrastructure. Historically, your network engineers and analysts worried about interconnection, route redundancy, and ensuring the bits could flow where they needed, reliably and sufficiently; and your system engineers worried about everything up to the point the bits hit “the network”. Moving forward, that is almost a debilitating dichotomy. Traditionally, in the post-mainframe era, a physical system did one or two things and its exclusion from the network or its under-performance on the network was a minor issue. With a physical system possibly hosting dozens of virtual systems- all with unique networking requirements, cross-talking requirements, and of course: networked storage requirements- your system engineers must be well-versed in network engineering. “The Network Is The Computer” is not just a Sun tag-line, or a lame cliche’. We’re now fully realizing the potency of that statement. Every system offering from the Big Three contains significant “infrastructure” features: Network features.

By pushing more and more network features into server systems- IBM servers with Cisco “swrouters” built-in, for example – the server itself has become more important and less relevant at the same time. Keeping it up and running well will require a new kind of system engineer because “the box” is now more complex: But at the same time, collections of “boxes” should be able to self-heal and adapt to the failures of others. Each system has now become disposable.

A large swath of the architectural literati are already deploying quantities of self-healing farms that take over the work – the very virtual machines – of failed or failing physical systems. Virtualization on its own wasn’t a game-changer. Virtualization with processor support and recognition sparked real potential. Virtualization on top of “infrastructure”-aware (e.g. heavily networked) physical systems has dramatically shifted the value of hybrid “networked systems engineers”, raised the bar for the “server guys” to get up to speed on the real internals of networking, and has provided the unprecedented opportunity to deploy redundantly resilient systems that can in-practice achieve five-to-seven “nines” of reliability.

WordPress Themes