Best Practices for Provisioning XenDesktop

We’ve written a lot here regarding XenDesktop’s two provisioning methods: Provisioning Services and Machine Creation Services. Earlier this week, at the Citrix Synergy Conference in San Francisco, there was a session specifically devoted to discussing those two provisioning methods, providing a high level overview of how they worked, the best practices for deploying each of them, and even some guidelines for how to determine which approach is best for your organization. For the benefit of those who couldn’t make it to Synergy – or those who did make it, but would like a better way to share that information with others in their organizations – that session was recorded and is available on Citrix TV. You can view it below:

When Is “End of Life” Not “End of Life?”

Last Friday (May 4), the news broke that Citrix had made some changes in their “End of Life” (EOL) dates. Just a couple of months ago, in our March issue of the Moose Views newsletter, we told you that if you were running any version of XenApp other than XenApp v6.5 on Windows Server 2008 R2, you needed to start seriously planning for your migration, because by mid-July of next year (2013), those older versions will all hit their EOL dates.

Apparently Citrix has been feeling some heat from customers who weren’t too happy about that, so they have announced something new called “Extended Support,” that will be available after EOL for an additional fee – which was not specified. The “End of Extended Support” (EOES) dates have been aligned with the comparable Microsoft dates for the underlying server Operating System.

The odd thing about this is that the EOL dates have not changed (except for XenApp v6.0, which will now hit EOL on January 2, 2015). It’s just that EOL doesn’t mean what it used to mean. Previously, when a Citrix product hit EOL, that meant there was no support available for it whatsoever. Now, apparently, “End of Life” means “End of Life Unless You Pay Us More Money to Keep Supporting You.”

For the record, the EOES dates for the versions of XenApp that run on Server 2003 have been set to July 14, 2015, and the EOES dates for the versions that run on Server 2008 (and 2008 R2) have been set to July 10, 2018.

You can read more about this on the Citrix Product Matrix Web page.

As of now, the Extended Support program for XenDesktop is still being defined…

Windows 8 Consumer Preview Video

You’re probably aware that the Windows 8 Consumer Preview (a.k.a. public beta) was released last Wednesday. And if you’re among those who are still trying to figure out how you’re going to get from Windows XP to Windows 7, you’ve probably been studiously ignoring any news that had anything to do with yet another version of Windows.

Personally, with all the changes going on here in Moose-ville (new Web site, new Desktop-as-a-Service offering, new people coming on board, etc.), I just haven’t had the time to follow all the Windows 8 news. But, now that’s there’s actually something that you can download and play with, I’ve started taking a closer look…and trying to figure out what system I might have laying around that I could actually put it on.

Since I’m a Windows Phone user, I’m somewhat familiar with the new “Metro”-style application interface. At first glance, I wasn’t sure why that was a good thing to have on a laptop or desktop PC. But after giving it a closer look, I’m finding that there are several things about it that intrigue me.

If you’re curious to know more, take a look at this demo video from Microsoft, and let us know in the comments what you think.

What the Heck Is a “Storage Hypervisor?”

Our friends at DataCore ran a press release yesterday positioning the new release (v8.1) of SANsymphony-V as a “storage hypervisor.” On the surface, that may just sound like some nice marketing spin, but the more I thought about it, the more sense it made – because it highlights one of the major differences between DataCore’s products and most other SAN products out there.

To understand what I mean, let’s think for a moment about what a “hypervisor” is in the server virtualization world. Whether you’re talking about VSphere, Hyper-V, or XenServer, you’re talking about software that provides an abstraction layer between hardware resources and operating system instances. An individual VM doesn’t know – or care – whether it’s running on an HP Server, a Dell, an IBM, or a “white box.” It doesn’t care whether it’s running on an Intel or an AMD processor. You can move a VM from one host to another without worrying about changes in the underlying hardware, bios, drivers, etc. (Not talking about “live motion” – that’s a little different.) The hypervisor presents the VM with a consistent execution platform that hides the underlying complexity of the hardware.

So, back to DataCore. Remember that SANsymphony-V is a software application that runs on top of Windows Server 2008 R2. In most cases, people buy a couple of servers that contain a bunch of local storage, install 2008 R2 on them, install SANsymphony-V on them, and turn that bunch of local storage into full-featured iSCSI SAN nodes. (We typically run them in pairs so that we can do synchronous mirroring of the data across the two nodes, such that if one node completely fails, the data is still accessible.) But that’s not all we can do.

Because it’s running on a 2008 R2 platform, it can aggregate and present any kind of storage the underlying Server OS can access at the block level. Got a fibre channel SAN that you want to throw into the mix? Great! Put fiber channel Host Bus Adapters (HBAs) in your DataCore nodes, present that storage to the servers that SANsymphony-V is running on, and now you can manage the fibre channel storage right along with the local storage in your DataCore nodes. Got some other iSCSI SAN that you’d like to leverage? No problem. Just make sure you’ve got a couple of extra NICs in the DataCore nodes (or install iSCSI HBAs if you want even better performance), present that iSCSI storage to the DataCore nodes, and you can manage it as well. You can even create a storage pool that crosses resource boundaries! And now, with the new auto-tiering functionality of SANsymphony-V v8.1, you can let DataCore automatically migrate the most frequently accessed data to the highest-performing storage subsystems.

Or how about this: You just bought a brand new storage system from Vendor A to replace the system from Vendor B that you’ve been using for the past few years. You’d really like to move Vendor B’s system to your disaster-recovery site, but Vendor A’s product doesn’t know how to replicate data to Vendor B’s product. If you front-end both vendors’ products with DataCore nodes, the DataCore nodes can handle the asynchronous replication to your DR site. Alternately, maybe you bought Vendor A’s system because it offered higher performance than Vendor B’s system. Instead of using Vendor B’s product for DR, you can present both systems to SANsymphony-V and leverage its auto-tiering feature to automatically insure that the data that needs the highest performance gets migrated to Vendor A’s platform.

So, on the back end, you can have disparate SAN products (iSCSI, fibre channel, or both) and local storage (including “JBOD” expansion shelves), and a mixture of SSD, SAS, and SATA drives. The SANsymphony-V software masks all of that complexity, and presents a consistent resource – in the form of iSCSI virtual volumes – to the systems that need to consume storage, e.g., physical or virtual servers.

That really is analogous to what a traditional hypervisor does in the server virtualization world. So it is not unreasonable at all to call SANsymphony-V a “storage hypervisor.” In fact, it’s pretty darned clever positioning, and I take my hat off to the person who crafted the campaign.

Yet Another Phishing Example

Today, we’re going to play “What’s Wrong with This Picture.” First of all, take a look at the following screen capture. (You can view it full-sized by clicking on it.)

Phishing Email from Aug, 2011
Phishing Email from Aug, 2011

Now let’s see if you can list all the things that are wrong with this email. Here’s what I came up with:

  • There is no such thing as “Microsoft ServicePack update v6.7.8.”
  • The Microsoft Windows Update Center will never, ever send you a direct email message like this.
  • Spelling errors in the body of the email: “This update is avelable…” “…new futures were added…” (instead of “features”) and “Microsoft Udates” (OK, that last one is not visible in my screen cap, so it doesn’t count).
  • Problems with the hyperlink. Take a look at the little window that popped up when I hovered my mouse over the link: The actual link is to an IP address (85.214.70.156), not to microsoft.com, as the anchor text would have you believe. Furthermore, the directory path that finally takes you to the executable (“bilder/detail/windowsupdate…”) is not what I would expect to see in the structure of a Microsoft Web site.”

If you want to know what sp-update.v678.exe would do if you downloaded and executed it, take a look at the description on the McAfee Web site (click on the “Virus Characteristics” tab). Suffice it to say that this is not something you want on your PC.

Sad to say, I suspect that thousands of people have clicked through on it because it has the Windows logo at the top with a cute little “Windows Update Center” graphic.

Would you have spotted it as a phishing attempt? Did you spot other giveaways in addition to the ones I listed above? Let us know in the comments.

Beware of Vendor-Sponsored “Analysis” Reports

Mark Twain allegedly came up with the famous line: "Figures don’t lie, but liars figure." That’s a good thing to keep in mind any time you’re looking through a report that was sponsored ("sponsored" = "paid for") by a vendor that concludes that their product is better than the other guy’s.

Maybe it is better than the other guy’s. But you might want to look closely at what was tested, how it was tested, and whether they were, shall we say, selective in the facts they present.

Case in point: The Tolly Group’s report, released May 27, comparing VMware View 4.6 Premier Edition to Citrix XenDesktop 5 Platinum edition. There are several interesting aspects to this report, which are dealt with in detail in Tal Klein’s blog over on the Citrix Community blog site. Here are a few of the more egregious items:

  • VMware View 4.6 Premier licensing costs less than XenDesktop 5 Platinum. Absolutely true, and absolutely irrelevant. That’s like pointing out that if you load every possible dealer option onto your new car, it’s going to cost more than the basic model. Thank you, Captain Obvious. If you want an "apples-to-apples" comparison, you need to compare VMware View to the XenDesktop VDI Edition. But wait, if you do that, XenDesktop is actually less expensive, and that would be an awkward point to publish in a paper that’s being paid for by VMware.
  • VMware’s PCoIP provides a more consistent multi-media experience than XenDesktop 5. (Over a LAN. Using a single thin client device that did not support any of the Citrix HDX media acceleration features.) Sorry, guys, but once again this is not an apples-to-apples comparison. And did they publish any results of testing across a WAN link? Nope…and for the same reason they didn’t use XenDesktop VDI Edition for their price comparison.
  • It’s easier to upgrade View 4.5 to View 4.6 than it is to upgrade XenDesktop 4 to XenDesktop 5. Once again, both true and irrelevant. It’s easier to give your kitchen a new coat of paint than it is to rip out the cabinets and completely remodel it. Anybody surprised by that? There are significant architectural changes from XenDesktop 4 to XenDesktop 5. It shouldn’t be surprising to anyone that this will involve more effort than a "dot release" upgrade.

I’ve always been skeptical of vendor-sponsored "analysis" reports, and, to be fair, Citrix has used the Tolly Group in the past for its own sponsored reports – but it seems to me that this one is just over the top. Apparently, former Gartner analyst Simon Bramfitt agrees. His pithy assessment of the report: "There are undiscovered tribes lost in the darkest parts of the Amazon jungle that would know exactly what to do if a vendor airdropped a pile of competitive marketing literature authored by the Tolly Group; send it back, and asked [sic] that it be re-printed on more absorbent paper."

What do you think?

Thin Clients vs. Cheap PCs

We have, for a long time, been fans of thin client devices. However, if you run the numbers, it turns out that thin-clients may not necessarily be the most cost-effective client devices for a VDI deployment.

Just before writing this post, I went to the Dell Web site and priced out a low-end Vostro Mini Tower system: 3.2 GHz Intel E5800 dual-core processor, 3 Gb RAM, 320 Gb disk drive, integrated Intel graphics, Windows 7 Professional 64-bit OS, 1 year next-business-day on-site service. Total price: $349.00.

When you buy a new PC with an OEM license of Windows on it, you have 90 days to add Microsoft Software Assurance to that PC. That will cost you $109.00 for two years of coverage. You’re now out of pocket $458.00. However, one of the benefits of Software Assurance is that you don’t need any other Microsoft license component to access a virtual desktop OS. You also have the rights, under SA, to install Windows Thin PC (WinTPC) on the system, which strips out a lot of non-essential stuff and allows you to administratively lock it down – think of WinTPC as Microsoft’s own tool kit for turning a PC into a thin client device.

Now consider the thin client option. A new Wyse Winterm built on Embedded Windows 7 carries an MSRP of $499. There are less expensive thin clients, but this one would be the closest to a Windows 7 PC in terms of the user experience (media redirection to a local Windows Media Player, Windows 7 user interface, etc.). However, having bought the thin client, you must now purchase a Microsoft Virtual Desktop Access (VDA) license to legally access your VDI environment. The VDA license is only available through the Open Value Subscription model, and will cost you $100/year forever. So your total cost over two years is $699 for the Wyse device vs. $458 for the Dell Vostro.

After the initial two year term, you’ll have to renew Software Assurance on the PC for another two years. That will continue to cost you roughly $54.50/year vs. $100/year to keep paying for that VDA license.

Arguably, the Wyse thin client is a better choice for some use cases. It will work better in a hostile environment – like a factory floor – because it has no fan to pull dust and debris into the case. In fact, it has no moving parts at all, and will likely last longer as a result…although PC hardware is pretty darned reliable these days, and at that price point, the low-end PC becomes every bit as disposable as a thin client device.

So, as much as we love our friends at Wyse, the bottom line is…well, it’s the bottom line. And if you’re looking at a significant VDI deployment, it might be worth running the numbers both ways before you decide for sure which way you’re going to go.

It’s Been a Cloud-y Week

No, I’m not talking about the weather here in San Francisco – that’s actually been pretty good. It’s just that everywhere you look here at the Citrix Summit / Synergy conference, the talk is all about clouds – public clouds, private clouds, even personal clouds, which, according to Mark Templeton’s keynote on Wednesday, refers to all your personal stuff:

  • My Devices – of which we have an increasing number
  • My Preferences – which we want to be persistent across all of our devices
  • My Data – which we want to get to from wherever we happen to be
  • My Life – which increasingly overlaps with…
  • My work – which I want to use My Devices to perform, and which I want to reflect My Preferences, and which produces Work Data that is often all jumbled up with My Data (and that can open up a whole new world of problems, from security of business-proprietary information to regulatory compliance).

These five things overlap in very fluid and complex ways, and although I’ve never heard them referred to as a “personal cloud” before, we do need to think about all of them and all of the ways they interact with each other. So if creating yet another cloud definition helps us do that, I guess I’m OK with that, as long as nobody asks me to build one.

But lest I be accused of inconsistency, let me quickly recap the cloud concerns that I shared in a post about a month ago, hard on the heels of the big Amazon EC2 outage:

  1. We have to be clear in our definition of terms. If “cloud” can simply mean anything you want it to mean, then it means nothing.
  2. I’m worried that too many people are running to embrace the public cloud computing model while not doing enough due diligence first:
    1. What, exactly, does your cloud provider’s SLA say?
    2. What is their track record in living up to it?
    3. How well will they communicate with you if problems crop up?
    4. How are you insuring that your data is protected in the event that the unthinkable happens, there’s a cloud outage, and you can’t get to it?
    5. What is your business continuity plan in the event of a cloud outage? Have you planned ahead and designed resiliency into the way you use the cloud?
    6. Never forget that, no matter what they tell you, nobody cares as much about your stuff as you do. It’s your stuff. It’s your responsibility to take care of it. You can’t just throw it into the cloud and never think about it again.

Having said that, and in an attempt to adhere to point #1 above, I will henceforth stick to the definitions of cloud computing set forth in the draft document (#800-145) released by the National Institute of Standards and Technology in January of this year, and I promise to tell you if and when I deviate from those definitions. The following are the essential characteristics of cloud computing as defined in that draft document:

  • On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service’s provider.
  • Broad network access. Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).
  • Resource pooling. The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, network bandwidth, and virtual machines.
  • Rapid elasticity. Capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out, and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
  • Measured Service. Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.

If you’ll read through those points a couple of times and give it a moment’s thought, a couple of things should become obvious.

First, most of the chunks of infrastructure that are being called “private clouds” aren’t – at least by the definition above. Standing up a XenApp or XenDesktop infrastructure, or even a mixed environment of both, does not mean that you have a private cloud, even if you access it from the Internet. Virtualizing a majority, or even all, of your servers doesn’t mean you have a private cloud.

Second, very few Small & Medium Enterprises can actually justify the investment required to build a true private cloud as defined above, although some of the technologies that are used to build public and private clouds (such as virtualization, support for broad network access, and some level of user self-service provisioning) will certainly trickle down into SME data centers. Instead, some will find that it makes sense to move some services into public clouds, or to leverage public clouds to scale out or scale in to address their elasticity needs. And some will decide that they simply don’t want to be in the IT infrastructure business anymore, and move all of their computing into a public cloud. And that’s not a bad thing, as long as they pay attention to my point #2 above. If that’s the way you feel, we want to help you do it safely, and in a way that meets your business needs. That’s one reason why I’ve been here all week.

So stay tuned, because we’ll definitely be writing more about the things we’ve learned here, and how you can apply them to make your business better.

IntelliCache and the IOPS Problem

If you’ve been following this blog for any length of time, you know that we’ve written extensively about XenDesktop, and spent a lot of time on best practices and problems to avoid. And one of the biggest problems to avoid is poor storage design resulting in poor VDI performance.

In a nutshell, the problem is that a Windows desktop OS uses disk far differently than a Windows server OS. Thanks to the way Windows uses the swap file, disk writes outnumber disk reads by about 2 to 1. You can build your virtual desktop infrastructure on the latest and greatest server hardware, with tons of processing power and insanely huge amounts of RAM, but if all of the disk I/O for all of those virtual desktops is hitting your SAN, you’ve got a scalability problem on your hands.

Provisioning Services (“PVS”) can help to mitigate this in two ways (assuming for sake of argument that you’re provisioning multiple virtual systems from a common, read-only image): First, if you build your Provisioning Servers correctly, you should be able to serve up most of the OS read operations from the Provisioning Server’s own cache memory. Second, you can use the virtualization host’s local disk storage as the required “write cache” – because all of those write operations have to go somewhere while the virtual system is running.

But XenDesktop 5 introduced a new way to provision desktops called “Machine Creation Services” (“MCS”). We wrote about this in the April edition of our Moose Views newsletter, so if you’re not familiar with all the pros and cons of MCS vs. PVS, I’d encourage you to take a brief time out and read that article. Suffice it to say that, despite all the advantages of MCS, the biggest downside of using MCS to provision pooled desktops was that all of the IOPS hit your SAN storage, which limited the scalability of an MCS-provisioned VDI deployment.

But all of that just changed, with the release of XenDesktop 5 Service Pack 1, which was made available for download a week ago (May 13). With SP1, XenDesktop 5 is now able to take advantage of the “IntelliCache” feature that was introduced as part of XenServer v5.6 Service Pack 2. Using MCS with the combination of XenDesktop 5 SP1 and XenServer SP2…

  • The first time a virtual desktop is booted on a given XenServer, the boot image is cached on that XenServer’s local storage.
  • Subsequent virtual desktops booted on that same XenServer will boot and run from that locally cached image.
  • You can use the XenServer’s local storage for the write cache as well.

The bottom line is that you can move as much as 90% of the IOPS off of the SAN and onto local XenServer storage, removing nearly all of the scalability limitations from an MCS-provisioned environment.

With most of the IOPS for running VMs taking place on local storage, it’s pretty straightforward to figure out how many VMs you can expect to support on a given virtualization host. Dan Feller’s blog post does a great job of walking through the process of calculating the functional IOPS that your local XenServer storage repository should be able to support, and inferring from that number how many light, normal, or power users you should be able to support as a result.

This also means that using XenServer as the hypervisor for your XenDesktop 5 deployment is going to yield a significant performance advantage over any other hypervisor, unless or until the other guys come out with similar local caching features. So, if you’re a VMware shop, my advice is this: Go ahead and virtualize all of the supporting XenDesktop server components on your VSphere infrastructure. Run your XenDesktop 5 VMs on XenServer hosts, and just don’t tell anyone! If you’re asked, just say, “Oh, yeah, these are my XenDesktop host systems – they’re completely separate from our VSphere infrastructure, because we don’t need the (insert favorite VSphere feature) function for these systems.” Your infrastructure will run better, and no one will know but you…

High Availability vs. Fault Tolerance

Many times, terms like “High Availability” and “Fault Tolerance” get thrown around as though they were the same thing. In fact, the term “fault tolerant” can mean different things to different people – and much like the terms “portal,” or “cloud,” it’s important to be clear about exactly what someone means by the term “fault tolerant.”

As part of our continuing efforts to guide you through the jargon jungle, we would like to discuss redundancy, fault tolerance, failover, and high availability, and we’d like to add one more term: continuous availability.

Our friends at Marathon Technologies shared the following graphic, which shows how IDC classifies the levels of availability:

Graphic of Availability Levels
The Availability Pyramid


Redundancy is simply a way of saying that you are duplicating critical components in an attempt to eliminate single points of failure. Multiple power supplies, hot-plug disk drive arrays, multi-pathing with additional switches, and even duplicate servers are all part of building redundant systems.

Unfortunately, there are some failures, particularly if we’re talking about server hardware, that can take a system down regardless of how much you’ve tried to make it redundant. You can build a server with redundant hot-plug power supplies and redundant hot-plug disk drives, and still have the system go down if the motherboard fails – not likely, but still possible. And if it does happen, the server is down. That’s why IDC classifies this as “Availability Level 1” (“AL1” on the graphic)…just one level above no protection at all.

The next step up is some kind of failover solution. If a server experiences a catastrophic failure, the work loads are “failed over” to a system that is capable of supporting those workloads. Depending on those work loads, and what kind of fail-over solution you have, that process can take anywhere from minutes to hours. If you’re at “AL2,” and you’ve replicated your data using, say, SAN replication or some kind of server-to-server replication, it could take a considerable amount of time to actually get things running again. If your servers are virtualized, with multiple virtualization hosts running against a shared storage repository, you may be able to configure your virtualization infrastructure to automatically restart a critical workload on a surviving host if the host it was running on experiences a catastrophic failure – meaning that your critical system is back up and on-line in the amount of time it takes the system to reboot – typically 5 to 10 minutes.

If you’re using clustering technology, your cluster may be able to fail over in a matter of seconds (“AL3” on the graphic). Microsoft server clustering is a classic example of this. Of course, it means that your application has to be cluster-aware, you have to be running Windows Enterprise Edition, and you may have to purchase multiple licenses for your application as well. And managing a cluster is not trivial, particularly when you’ve fixed whatever failed and it’s time to unwind all the stuff that happened when you failed over. And your application was still unavailable during whatever interval of time was required for the cluster to detect the failure and complete the failover process.

You could argue that a fail over of 5 minutes or less equals a highly available system, and indeed there are probably many cases where you wouldn’t need anything better than that. But it is not truly fault tolerant. It’s probably not good enough if you are, say, running a security application that’s controlling the smart-card access to secured areas in an airport, or a video surveillance system that sufficiently critical that you can’t afford to have a 5-minute gap in your video record, or a process control system where a five minute halt means you’ve lost the integrity of your work in process and potentially have to discard thousands of dollars worth of raw material and lose thousands more in lost productivity while you clean out your assembly line and restart it.

That brings us to the concept of continuous availability. This is the highest level of availability, and what we consider to be true fault tolerance. Instead of simply failing workloads over, this level allows for continuous processing without disruption of access to those workloads. Since there is no disruption in service there is no data loss, no loss of productivity and no waiting for your systems to restart your workloads.

So all this leads to the question of what your business needs.

Do you have applications that are critical to your organization? If those applications go down how long could you afford to be without access to them? If those applications go down how much data can you afford to lose? 5 minutes? An hour? And, most importantly, what does it cost you if that application is unavailable for a period of time? Do you know, or can you calculate it?

This is another way to ask what the requirements are for your “RTO” (“Recovery Time Objective” – i.e., how long, when a system goes down, do you have before you must be back up) and “RPO” (“Recovery Point Objective” – i.e., when you do get the system back up, how much data it is OK to have lost in the process). We’ve discussed these concepts in previous posts. These are questions that only you can answer, and the answers are significantly different depending on your business model. If you’re a small business, and your accounting server goes down, and all it means is that you have to wait until tomorrow to enter today’s transactions, it’s a far different situation from a major bank that is processing millions of dollars in credit card transactions.

If you can satisfy your business needs by deploying one of the lower levels of availability, great! Just don’t settle for an AL1 or even an AL3 solution if what your business truly demands is continuous availability.