Ray Nugent posted a message on the excellent Google Groups discussion forum on Cloud Computing that deserves a lengthier response than what I can post on the forum there, so here goes.
Ray asks about GigaSpaces pricing on Amazon EC2. For convenience I am pasting his entire post here, but please check out the entire thread that led to it:
Hey, Geva, Gigaspaces is cool stuff but at $1.60 an instance hour I'd say it's far from grail status (much less Holy.) Given the number of instances one needs to make it work properly it's priced just like enterprise software. I know you guys are entitled to a reasonable return on you're investment but...
The big potential draw of cloud computing is massive scalability at low cost. Doing the math, an instance year for a small but functioning Gigaspaces system is, at the minimum, $63K a year (3 large instances @ .80 plus $1.60 times 8736 hours per year.) This, of course, does not include other vendors charges - I'm guessing Oracle will be somewhere in the $3-5 dollar range. All of the sudden the stack is getting expensive...
I appreciate Ray saying "GigaSpaces is cool stuff" -- It's great to be called "cool" -- but there is a little more to it than that. GigaSpaces saves significant costs compared to other solutions, and it does so in 5 distinct ways:
Cost Measure #1: Reducing the Number of Servers Required
It reduces the number of servers required to achieve the same throughput. I will not go through the entire explanation of how GigaSpaces works. If you want to learn more, a good start is this interview with me. Also, check out these webcasts and Nati's posts on space-based architecture. Generally, by putting the entire application in a container (all services, messaging, data and web container), and running it in-memory, in-process, bottlenecks are removed and each server achieves a higher throughput. Throughput compared to database/J2EE centric applications has been known to go as high as 10x to 100x. and you can see some testimonials of that here and here (and some other links throughout this post). Simply put, this means you will need a tenth to a hundredth less hardware resources to process the same data or transaction volumes.
Cost Measure #2: Reducing the Number of Middleware Product Licenses Required Per Server and Eliminating Integration Costs
Because of GigaSpaces' unique approach, the GigaSpaces product acts as the container for business logic (equivalent to an EJB container), is a full-fledged messaging system (with a JMS API implementation) and an in-memory data grid (a.k.a distributed read/write cache), which acts as the system of record for transaction state, reducing much of the development and integration work related to database clustering. As of version 6.6, it also comes pre-integrated with a web container (initially Jetty, but others to come). This means that you are getting all of your middleware functionality from a single product, purchased with a single license, and without requiring integration. You also reduce some of the need for purchasing and/or integrating other aspects of middleware, such as high-availability (e.g., Veritas FS), clustering (e.g., Oracle RAC) or distributed transaction managers.
I recently spoke to one of our customers, a web gaming company that is one of the top 3 in the casual gaming category. They are implementing an MMORPG using GigaSpaces. The lead architect explained to me that they were initially evaluating distributed caching products (such as Oracle Coherence) and messaging products (such as Sonic MQ and Active MQ). He said that he and the team quickly realized that not only do they need to understand the "universe " (his word) of each product, they also need to invest a significant amount of time integrating them (which was estimated to be a major part of the development effort). Once they would have done that, they end up with a tightly coupled system that is difficult to scale and change. Instead, they opted to choose GigaSpaces as single product that can provide both kinds of functionality.
Another interesting comparison is revealed by this amusing press release from Oracle about a start-up customer named Qtrax. Look at the stack Oracle managed to sell them to build their application. I put in brackets the price per CPU for each Oracle product from the official price list, and I quote:
Qtrax's implementation includes Oracle Database [$17.5k to $47.5k], Oracle Real Application Clusters [$23k], Oracle Enterprise Manager [$3.5 to $20k+] and components of Oracle Fusion Middleware [?], including Oracle Application Server [$10k to $30k] and Oracle Coherence [$4k to $25k]. With this software now in place, Qtrax will have the ability to support millions of concurrent users [they better!].
On top of these numbers (which total in the range of $58k to $145.5k per CPU1)add a 22% annual support fee. As these are perpetual licenses, let's break the license numbers to an hourly rate by assuming 24/7 for 3 years: we get $2.20 to $5.54. Even if you decide to be generous and divide by 4 years, you get $1.65 to $4.15. Now, let's not forget that Oracle doesn't actually offer any special pricing for it's products on EC2 (i.e., an hourly rate)2 so you would have to buy the licenses upfront, as Qtrax apparently did.
Because Oracle is so out there in terms of pricing, I think it's more useful to compare to JBoss, which actually does provide hourly pricing on EC2. Jboss published pricing for EC2 looks like this compared to GigaSpaces (excluding Amazon charges):
|EC2 Product||Jboss Pricing||GigaSpaces
|Extra Large Instance||$1.14||$1.60|
|High CPU Medium Instance||$1.11||$0.20|
|High CPU XL Instance||$1.14||$1.60|
|GB Transfer In||$0.01||$0.00|
|GB Transfer Out||$0.02||$0.00|
|GB Regional Data Transfer||$0.01||$0.00|
As you can see, GigaSpaces is significantly cheaper for Small and Large instances. We are a bit more expensive on the XL instance. The reason for this is that GigaSpaces leverages memory. The XL instance has 8 GB of RAM compared to the 4 GB you get with the Large instance. Because GigaSpaces scales linearly, an application is likely to achieve double the throughput with twice the memory capacity.
But read the small print. Jboss charges you a premium on bandwidth usage. GigaSpaces does not. I'll let each reader estimate their own bandwidth charges as they pertain to their application.
But even if GigaSpaces was more expensive on this per CPU comparison (and it isn't), because of what I discussed above -- the elimination of other middleware components and the need for integration among them -- the GigaSpaces solution will be significantly cheaper. To understand this better, please see Mickey Ohayon's post How I Ported an Online Gaming Application from (Not-So) Good-Old-JEE to GigaSpaces in Only 4 Days. In it Mickey describes how he increased a client's application throughput from 15 transactions per second to 1,500 transaction per second (not a typo) by replacing an architecture using JBoss, Oracle RAC, Sun JMS and a couple of other components. Mickey gives technical details, including code.
Cost Measure #3: Linear Scalability
Linear scalability3, or lack thereof, has a huge impact on costs even in a traditional corporate data center setting. This impact grows significantly in a cloud/utility computing environment because of on-demand pricing (per usage). If you are interested in a more detailed discussion of this topic, please read Economies of Non-Scale.
Most systems, especially classic n-tier architectures, do not scale linearly. Linear scalability means that the system is able to increase its capacity (throughput, concurrent users, etc.) in a linear relation to the amount of resources added to the system. So, for example, if one server can process 500 transactions per second, two servers can process 1,000, ten servers can process 5,000 and so on. Non-linearly scalable systems face diminishing returns: each additional resource I add will increase system capacity by a smaller increment. So while the first server can process 500, two servers can only process 900, three servers will process 1,220 and so on. At some point, a non-linearly scalable system will hit a wall: adding additional servers will not increase throughput at all.
From a cost perspective, non-linear scalability has two effects:
- As the system scales, per-transaction costs increase, making the business less marginally profitable.
To make this more concrete, let's take an example comparing a linearly scalable application to one that is not. To understand the cost differences, we need to understand the level of contention in each application. The definition of contention for our purposes is the amount of time the application waits on some centralized (i.e., non-parallelizable) component before it can proceed with the processing at hand. Some common causes for contention include database connections and locking, persistence and transport mechanisms for messaging middleware and ESBs and distributed transaction managers. From our experience traditionally architected (i.e., n-tier) apps often face contention levels of 20% to even 40%. The application's ability to scale is inversely correlated with its level of contention. Amdahl's Law provides us with the formula to calculate what effect different levels of contention will have on our ability to scale given a set of resources. So using Amdahl's Law, let's compare our linearly and non-linearly scalable apps in dollar figures.
Let's assume both applications cost the same on a single server - say $3.00 per Server per hour (everything included). And let's say that they each have the same throughput in a single server - say 100 tx/sec. If we want to double the throughout to 200, the non-linear app would require 3 servers (or $9 per hour) and the linear app would require only 2 servers (or $6 per hour) -- a 33% difference in cost! If we need to increase throughout to 400 tx/sec. Our Non-Linear App will need 16 servers (at $48 per hour), while our Linear App only requires 4 servers (at $12/hour) -- a 75% cost saving!
To learn more about how GigaSpaces reduces application contention, see this post from Uri Cohen.
- As the system hits the scalability wall, it needs to go through an expensive re-design and re-write process. Stories of such re-architecture emergencies abound. Here's a really good example from MySpace. But let's understand the costs impact better using our example above. Say we're MySpace or Twitter or any number of other successful web apps and we need to increase our throughout not just to 400% original capacity, but to 4000%? With our Linear App, we'll need 40 servers at a cost of $120/hour. With our Non-Linear App we'll need...eh, we'll need... oh, wait -- we can't do it at all. In fact, the app would have stumbled at 500%.What does this scenario cost the business? Hard to say. It may lead to slow-downs, it may lead to partial failures, it may lead to all out disaster (anyone try to activate their 3G iPhone on launch day?). Gartner gives a good list of things that might help you estimate what that costs your business:
One more aspect of linear scalability is predictability. Linear scalability provides the business with predictability: you know exactly by how much your system will scale with additional resources. As one Apple customer commented on CNET about the recent 3G iPhone and 2.0 launch debacle, which turned his device into an iBrick:
Server crashes, bandwidth problems...acceptable if this was a sudden, unforeseeable demand on resources. Not in this case - or in any case with a product launch. I doubt Jobs just woke up this morning and announced the 3G or the 2.0 upgrade....it's been in the works for months.
My answer to this fellow is that it is extremely difficult to test siloed, tiered, complex systems in real-life scales such as this, and it's damn well impossible to predict how such a system will scale under such heavy loads. Linear scalability provides this assurance. How much is that worth for the business? And here we're talking about an anticipated peak in load. What about unanticipated fluctuations? How much is it worth for the business to know that no matter what happens, they just need to provision additional servers and - voila - the system scales.
In a case study we recently published about one of our customers, Monte Paschi Asset Management, Alberto Santini, who heads one of the Italian bank's development teams, said: "With GigaSpaces, we don't have to worry about fluctuating loads because we can add or remove resources when needed, while running our applications simultaneously." Hot-Pluggable, baby. Speaking of costs, MP says they recovered their entire investment in GigaSpaces within less than three months. That's ROI. You can read the full story here.
Cost Measure #4: On-Demand Scalability
Although related to linear scalability, on-demand is more about being able to scale (and shrink) the application when we want to and -- in a cloud environment -- only pay for what we use when we use it. With GigaSpaces the application can grow and shrink with no code changes, and little, if any, config changes. Shrinking an application (and not paying for the extra capacity) is as simple as killing the unnecessary EC2 instances. The application will continue to run smoothly, without a single transaction loss.
This brings us back to Ray's question at the beginning of this post. Ray -- let me suggest that you're looking at this wrong. First, you talk about "massive scalability", but you give an example of 3 servers running continuously throughout the year (24/7/365). For such a small scale, and for such a constant load, why use EC2 at all? Join the GigaSpaces Start-Up Program, get the full production license for free on-premise and get dedicated servers. It will be a much cheaper solution all around.
For our own purposes (testing machines) we did a rough calculation that only when a server has less than 70% average utilization (throughout the whole year) does it make sense to use EC2 cost-wise.
The nice thing about it is that once you've built the application on GigaSpaces, if and when you decide it's appropriate, you'll be able to easily and quickly port it to Amazon EC2 or other clouds we are or will be supporting, such as GoGrid, Flexiscale or AppNexus.
If, on the other hand, you expect fluctuating demand, constant growth, possible surges due to success, or are already facing large demand, then you should be thinking of EC2, and then, given all the factors I gave above (and the additional one I specify below), well, that's where GigaSpaces starts looking like a very cost-effective solution for you.
Cost Measure #5: Reduce framework development, integration, testing and change cycles
The bottom line is that GigaSpaces provides you with the complex plumbing you shouldn't want to deal with, and we've done it ideally suited for cloud environments. Do you really want to get into this business or would you rather focus on the things that make your business unique?
Here's what our customers had to say in response to a question posted on LinkedIn that asked: "Are you using GigaSpaces in production? What are your experiences with it? What are the pitfalls? What were the biggest benefits?"
In response to the benefit bit, Keerat Sharma, platform engineer at the Gallup Organization, wrote:
Bang for buck. The cost of building a system (even with existing open source elements) that can replicate single dimensions of what GS can offer will outweigh the cost of using their system. Put this way- with GS, you get an enterprise system- partitioning, replication, failover mechanisms and the like all out of box.
You can write whatever you want to interact with the space, at whatever granularity you like. We decided to write some specialized code to monitor the space in a really lightweight fashion.
We got huge average latency reductions. If you're using it for a cache, you can do the math on how many expensive hits you stand to reduce. The larger the cluster, the greater the benefit really.
And Ashmit Bhattacharya, VP Engineering at Blackhawk Networks, weighed in with:
The solution was demonstrated in less than 3 weeks and we had a linearly scalable solution at the end of the exercise. We put in additional development effort with tremendous support from the Gigaspaces team over the next few months to deliver a solution with industry leading performance numbers in terms of concurrent connections, transaction response times and ability to handle failure. The partitioning and clustering capabilities of the GS framework worked flawlessly out of the box...
...The best part of the solution in our particular case was the manner in which the solution scaled horizontally. This took a tremendous burden off my architecture teams and we could focus more on functional development of our solution rather than work on the framework. The GS team is an absolute pleasure to work with.