April 27, 2008

Linear Scalability Does Exist, Ted

I enjoyed reading Ted Dziuba's I'm Going to Scale My Foot Up Your Ass, I really did. I like the 'tude and I like the style of writing. Reminiscent of the BileBlog (RIP), which some of my colleagues think is juvenile, but I think is hilarious. I loved it all, except for one big problem: Ted is dead wrong on the facts.

Now, I agree with Ted that in many cases the problem is shitty code. But that's exactly the point. Developers should not have to code in a way that requires a degree in computational mathematics (did I get your degree right, Ted?) to get their application to scale.

And I also agree with Ted that memcached isn't the end-all to scalability problems. In fact, I will probably go on a memcached-related rant some other time. But I do think it is a move in the right direction.

Bottom line, Ted, linear scalability does exist and architecture is the problem. To get what I am saying check out this and then read through Nati's blog. If you disagree then, let's talk. Did I mention I work for GigaSpaces?

January 18, 2008

Excel That Scales: The Movie

Microsoft_excel_2Back in June of last year I wrote about our partnership with Microsoft and our plans to work together on a solutions for scaling out computations on Microsoft Excel spreadsheets. Since then Microsoft and us both released joint material (see here on MSDN) and held joint events promoting the solution. The most up-to-date white paper on the solution can be found here.

But now, Owen Taylor produced a screencast that describes the Microsoft-GigaSpaces joint "Excel That Scales" solution, in which he walks you through the problem and the solution.

Listen to the presentation.

Synopsis:
In many organizations -- for example in capital markets and oil & gas exploration -- Excel is used widely for complex computations and analytics. Excel is a flexible tool that many people are familiar with, so over time huge investments have been made in creating complex analytical models in Excel. However, it was never designed to be an enterprise-grade analytical tool. As data volumes are growing, the need to have real-time information is intensifying and the number of users who wish to share the same computational logic and data is increasing, desktop-based Excel spreadsheets could no longer handle the loads. Also, the functions they perform are becoming mission-critical and valuable time and information could be lost in case of failure.

Enter the GigaSpaces solution. It combines the best of both worlds: Excel as the front-end and the power of your data center -- through GigaSpaces as the scale-out, highly-reliable application server - -at the back-end. In other words, the logic and the data are handled server-side with enterprise-grade reliability and performance.

Owen says it better and shows a demo.

October 23, 2007

Nope. Still don't see Oracle

Although I usually let personal attacks slip by, I couldn't let this post from Cameron Purdy remain unanswered, because it's kind of shameless. And BTW, sorry it took me some time, but like Cameron, I too am friggin' busy with real important stuff :-)

So I'll start with Cameron's ominous threat -- in answer to my question Where is Oracle? he said: "We are in your customer accounts. Every single one of them." I don't know if Cameron is naive enough to believe this (doubt it), is just trying to sound threatening (shaking in my boots) or is just spreading what he thinks is FUD, but obviously the threat is not very credible.

We all know that Oracle is a mean-ol' sales machine (throwing in Coherence for free to close an ELA on other products is pretty aggressive). Nothing new there. But fortunately, the world is bigger than Oracle, and not only does it have competitors, but there are many companies out there who wouldn't let an Oracle sales person set foot. And we will be in every single one of them...

Interesting thing is that the Oracle announcement of the Tangosol acquisition actually helped us close faster a few competitive deals in so-called Oracle-shops, because customers said (and I quote one directly): "I don't want to put all my eggs in one basket." I just came back from a GigaSpaces off-site management meeting, and the sales execs could only come up with one case where we actually lost a deal because of the Oracle acquisition (and that was due to the aforementioned aggressive tactics).

Anyway, enough with the petty bickering (I'll get back to petty bickering later). I was making a bigger point: Oracle is not pursuing an innovation strategy, and is therefore losing its relevance for new applications.

In Where is Oracle? I quoted Nick Carr's analysis of Oracle's strategy, which ended with: "Through acquisitions and share gains, [Oracle] will milk the old model until the old model goes dry." Following the whole BEA situation, here's a perspective from Rob Hailstone of the Butler Group (via Java Developer's Journal):

Some 15 or more years ago I was in the audience at an IT conference, listening to Larry Ellison describing CA as a ‘bottom-feeder’.

Since I was working for CA at the time this rankled a bit, but it was certainly true that CA had evolved a good business model that included acquiring end-of-life software companies where there was a substantial user base in need of ongoing support.

The last few years has seen Oracle adopt a variation on this bottom-feeder business model – it doesn’t wait for the victim to be near the end, but puts the knife in at the first sign of weakness.

Well, I guess you can call that innovation.

Besides Oracle's own strategy, I don't think that anyone can be taken serious claiming that all's well for Big-Ass commercial databases in a Web 2.0, scale-out world. I phrased my statement pretty carefully: "They overwhelmingly use MySQL and in many case use some other tier - such as their own file system or an in-memory cache, as the system of record for session or transaction state."

I'm sure that somewhere in the bowels of the organization of eBay, Amazon and Google there is an Oracle database or two installed, but that's not the point.

Now, back to the petty bickering. I don't get why Cameron keeps saying that we use Coherence for our web site. Oh, now I get it. I guess because our Wiki uses Atlassian Confluence and Coherence is used as a cache for Massive Confluence, he concluded that GigaSpaces uses Coherence. Clever boy! But seriously, not only do we not use Massive Confluence, by that token, every time Cameron trades on the NYSE he uses GigaSpaces.

Oh, I guess he's just using the famous Geva Perry tactic of "it's OK to make up anything and post it on my blog". I like it. Aggressive, Oracle-style.

October 12, 2007

Grid-Enabling Resource-Intensive Applications

Timothy Hoehn and Bob Zeidman from Zeidman Consulting published on Dr. Dobbs a really interesting analysis of what's the optimal way to distribute an application across multiple machines.

They examined several methods and reached the conclusion that the optimal approach is distributed objects and the master-worker paradigm.

Sounds familiar? :-)

August 03, 2007

Grid in Financial Services

Earlier this week, GridToday published an excellent piece by Marc Jacobs of Lab49 (a GigaSpaces partner) entitled Grid in Financial Services: Past, Present and Future. What an eloquent, well-written piece.

Each trading day is a perfect storm. Every month, every quarter, the volume of data increases, the sophistication of algorithms and business processes grows, and the competitive pressure to get things done as quickly and efficiently as possible mounts.

Marc reviews the state of the union on distributed computing in financial markets Including the motivation (we can no longer throw expensive hardware at the problem, we need a new approach) and the current state of affairs (distributed computing is real not academic, with commercial vendors and real implementations).

Our appetite for computing power isn’t satisfied with lone, uncoordinated machines. For financial services, distributed computing isn’t a luxury: it puts food on the table.

One of the challenges Marc observes is the fact that until recently (and still going on in some places) developers have been dealing with a lot of the "plumbing" issues of distributed computing and have been doing so in an inconsistent way. Part of the solution he sees is a new generation of vendors and products that address this:

The range of stable, usable distributed computing platforms -- such as those from Platform Computing, GigaSpaces and Digipede Technologies... Thus, it is becoming much rarer to find software development teams in financial services working on this type of plumbing.

However, one of the obstacles that Marc points out is the fact that various vendors are only dealing with one aspect of the issue or another, and rarely take and end-to-end approach. He has this to say, which is especially nice for GigaSpaces (my emphasis):

For example, while it is positive that there is a wave of vendor products that solve different parts of the distributed computing puzzle, few of them treat distributed application development as a holistic endeavor that encompasses many problems (i.e., job scheduling, event processing, data distribution and caching, security, deployment, APIs, IDEs, etc.) at once. Except for GigaSpaces, most distributed computing architectures require the assembly of infrastructure from several different vendors. While this does permit architectures built from best-of-breed solutions, it can be challenging to stitch the various pieces together into a coherent developer framework.

There are many other interesting topics, such as the need for both IT management of distributed systems as well as developer-friendliness. Again, he had something nice to say about GigaSpaces (and our friedns at Digipede):

Unfortunately, few vendors have been able to make progress on both fronts. Some products, such as Digipede and GigaSpaces, are clearly more developer-friendly than others.

Highly recommended read. BTW, Marc has a great blog, which I read regularly: Serial to Parallel to Distributed.

As an aside, the Lab49 guys are a very sharp, well-spoken, experienced group that's worth paying attention to (see their group blog). I recently had the pleasure of doing a web seminar with Daniel Chait (founder and managing director of Lab49) for CMP. You can see it here.

Update: Here's Tom Groenfeldt's take on Marc's piece.

July 28, 2007

GridGain-GigaSpaces Integration

Our friends at GridGain just released the GA of their version 1.5, including integration to GigaSpaces as a data grid.

GridGain makes very simple, robust open source software for computational grids. It's ideal for performing parallelizable tasks, such as MapReduce (i.e, split the work, calculate, aggregate the results).
Check out technical info on the integration here, Nikita Ivanov's blog, and their integration page.

From an integration point-of-view what's interesting about this is that they used GigaSpaces' new OpenSpaces Framework (our open-source declarative API using Spring).

GridGain is the latest in our ongoing integration efforts with open source projects and cooperation with the companies behind them, including Mule ESB (MuleSource), Spring Framework (Interface21) and Hibernate.

Expect to hear more from us on this very soon.

June 18, 2007

SIFMA & the GigaSpaces-Microsoft Joint Solution

Microsoft_logo_qjpreviewth_2 Microsoft and us announced today our joint solution of GigaSpaces integration with Excel and Compute Cluster Server.

I've written before about the problem this joint solution solves in Grid Meets the Middle Office.

Here's an excerpt:

    The solution addresses two fundamental challenges grid users in capital markets are facing today. First, it allows organizations to move large volumes of data to compute nodes with low-latency performance, and, second, it eliminates the disconnect between the front office and the data grid. In addition, GigaSpaces' platform provides a highly scalable application architecture that enables organizations to keep pace with rapid growth.

    "In the financial services industry, large and fast-growing applications based on Microsoft technologies need to process high volumes of transaction data very fast," said Stevan Vidich, U.S. capital markets industry technology strategist at Microsoft Corp. "Our work with GigaSpaces supports those requirements by offering end users a way to process high volumes of low-latency data using Excel-based applications. Smooth interoperability between Java and .Net is an added bonus."

Congrats to my business development team at GigaSpaces, and specifically Amnon Raviv and Dekel Tankel, for making this happen.

Come check it the solution at our SIFMA booth (#1419) this week:

    GigaSpaces and Microsoft will preview the combined solution, providing live product demonstrations,  June 19-21 in New York City at SIFMA's Technology Management Conference. Additionally, a white paper published by the two companies will be available on Microsoft Developers Network (MSDN) Web site in late June.

Hope to see you there.

May 16, 2007

Excuse My French

Marc Fleury is funny. Must be fun to have FU money.

April 09, 2007

Tower of Babel

Our industry has a problem, and it has to do with words. We use the same words to mean different things, and different words to mean the same thing. In particular, there is a problem with naming categories of technology or products. Consider the terms grid computing, utility computing, on-demand computing, high-performance computing. Some people see them all as referring to the same thing. Others believe there are nuances among them, and yet others feel they refer to completely different things. Now, also add to the mix the terms fabric, virtualization, distributed computing -- all of which are also frequently used to refer to similar -- if not the same -- things, and you've got a big mess.

I was thinking about this because it seems that GigaSpaces, and the tech category it belongs to, is now at a point where it is crossing the proverbial chasm from early adopters to mainstream customers. And an important part of maturing is having a  clear and consistent name for the product category.

This Tower of Babel phenomenon occurs partly by coincidence -- different people come up with different words to describe what they or others are doing more or less at the same time. But partly, this is happening by design.

I see three main reasons for this (in no particular order):

Reason 1: The prevailing Silicon Valley common wisdom of marketing technology is based on Geoffrey Moore's seminal Crossing the Chasm. In it, Moore writes that if you are introducing a new technology, invent a new category for it and position yourself as the leader.
So you've now got thousands of vendors -- start-ups or established -- inventing new categories and crowning themselves supreme leader of the category. For example, the terms Business Service Management (BSM) emerged to describe the suite of application development, testing and monitoring tools provided by vendors such as BMC Software. To establish itself as a leader in its category, Mercury Interactive (now part of HP) went ahead and created the competing term Business Technology Optimization (BTO). You now have two terms referring to what is essentially the same thing.

Reason 2: Competing analysts need to make their mark on the world. The Forresters and Gartners of the world want to be able to say "I called it." So they describe an emerging trend -- and name it. For professional prestige reasons, the competing analyst firm cannot use the other firm's terminology, so they describe the same phenomenon in a slightly different way and give it a different name. Luckily, the big analyst firms have consolidated and there are only a few of them now, but on the other hand, there is a growing number of boutique firms, Wall Street analysts, the press, the blogosphere -- all throwing their own phrases into the mix.
So as an example from my own experience with GigaSpaces: Gartner categorizes us under the general area of Extreme Transaction Processing (XTP). Within it, they place us in two sub-categories: Grid-Based Application Platforms (where they name GigaSpaces as the leader and include vendors such as Appistry) and Distributed Caching Platforms (Where they name GigaSpaces as the leader together with Tangosol/Oracle). Forrester categorizes us under the umbrella of the Information Fabric (and more recently Information-as-a-Service (IaaS), and within it in the sub-category "Data Grid." The various vendors in the spaces use any and all of these terms, as well as things like "Data Fabric" and "Clustered JVM."

Reason 3: "The king is dead; long live the king!" What do you do if you provide hosted application services, commonly known as an Application Service Provider (ASP), but no one wants to touch that with a ten-foot pole, because of the billions of VC investments lost in that category during the dot-com bust? You invent a new name for the category: you call it Software-as-a-Service (SaaS). Presto! Out with the old and un-sexy; in with the new and trendy.

The reason this is an industry-wide problem that we should all care about is because the confusion caused by this creates an impediment to the adoption of new technologies.

Moore states this concisely when he writes:

Potential customers cannot buy what they cannot name, nor can they seek out the product unless they know what category to look under.

Our industry is already dealing fairly competently with a similar issue: technology standards. Why not apply the same approaches to "marketing standards"? Sure, reaching technology standards is often a  lengthy process, fraught with  political maneuvering and compromise, but at the end of the day it works. And the reason it works is because vendors realize that lack of standards are a barrier to overall adoption of their products. It's an issue of increasing the size of the overall pie, and not just fighting for a slightly bigger slice of a small pie.

So borrowing from the tech standards approach, here are a few possible ideas to consider when it comes to technology category naming conventions:

  • A Standards Body -- made of analyst firms, vendors and other stakeholders
  • A Community Process -- similar to the Java Community Process (JCP)
  • And my favorite: A Wikipedia-like model (perhaps with some modifications making it a hybrid with a standards body). So for example, under the entry Object Spaces in Wikipedia, it says: "It has been suggested that this article or section be merged into Tuple space. (Discuss)". Besides a discussion process, one can also imagine a polling function and actual voting.

Some of this will always remain a problem, because the different products in a ctegory don't usually overlap 100% and therefore it does make some sense to categorize them differently. It will also remain a problem because vendors, and perhaps rightly so, will always want to claim their turf and have the game played by their rules. This also happens with technology standards. But still, seems to me that there is a lot of room for improvement.

March 26, 2007

It's the architecture, stupid!

I haven't posted in a while due to extensive travel during the past three weeks: San Diego for the CMP Exchange Solution Provider show, London for QCon and last week in Las Vegas for TheServerSide Java Symposium. More on some of these in future posts, but the Oracle-Tangosol acquisition news that came out on Friday (and had been anticipated by us for some time -- it's a small world...) is the big thing everyone's talking about.

Nati posted an excellent analysis on this on the GigaSpaces blog. His post seems to have resonated well with others, such as Patrick Logan and John Powers of Digipede.

So to re-emphasize some of Nati's points in my own words:

  • The Oracle acquisition of Tangosol is a strong validation of a new emerging category of middleware software by a major vendor
  • It was certainly an excellent move for Tangosol (as a non-VC backed company, a lot of people there, and especially Cameron, are to get a big fat check). Furthermore, because the market is heating up and getting more competitive with well-funded companies such as GigaSpaces, this was the right time for Tangosol to do this
  • It was a pretty good move for Oracle, and shows that they have a fair grasp of where the world is going to, but it leaves much to be desired. Nati discusses this in detail in the post referenced above, as well as touches on it in When You Need More Than Just a Data Grid.

The Tangosol approach all along has been that in order to solve performance and scalability problems, you need to solve the data problem -- i.e., move from a centralized, remote, disk-based database to a distributed, local, in-memory cache (aka Data Grid). That's fine.

The GigaSpaces approach has been all along that by only addressing the data bottleneck, you are merely taking an aspirin, not fundamentally curing your chronic migraines. In other words, the crux of the issue lies in the architecture -- n-tier architecture to be exact. Without a complete paradigm shift, you will not find the ultimate solution to the needs of the fast-growing category of what Gartner calls Extreme Transaction Processing (XTP), of real-time analytics, of high-performance SOA and of massive web applications of all sorts.

Besides the many GigaSpaces customers who are proof that this approach is being accepted. Look at the architectures of Google, Amazon, eBay, MySpace, LiveJournal and other Web stalwarts. They have all come to the same conclusion - with different nuances. They have all realized that the level of scalability, reliability and performance they need -- while keeping cost and complexity down -- will not come from a J2EE app server + database + messaging. It will not come from an n-tier architecture. Instead, they moved to a scale-out architecture, which aims for a shared-nothing approach.

So what is the GigaSpaces approach?

We call it Space-Based Architecture (SBA). I will not go into it in great detail here, because it has been explained extensively in our various blogs and white papers, but it follows the following principles:

First:

  • Collapse the tiers into a single process
  • Co-locate the services in a single process
  • Manage state and other in-flight data in memory

You have essentially created a self-sufficient process ("Processing Unit" in GigaSpaces parlance). No more network hops. No more database calls.

Now:

  • Scale-out these self-sufficient processes across your hardware infrastructure (cheap, standard hardware, mind you)
  • Have the middleware partition and load-balance incoming requests across the many processes
  • Dynamically manage the environment from a single-point-of-access (but not a single-point-of-failure) with SLAs for response times and reliability

The resulting architecture is:

  • Linearly scalable -- because there is no dependency among the Processing Units, the law of diminishing returns does not apply. Each additional unit added provides the same throughput as the one before it
  • Low latency -- because network hops between the tiers and the services that make up the application have been eliminated, and because data and events are accessed locally and in memory
  • Simple -- because you have a single clustering model to manage high-availability, load-balancing and partitioning across your entire environment

Until the Oracles of the world acknowledge that the architecture their products assume is not viable for this class of applications, the headaches will keep coming back to plague them and their customers.