Thursday, April 7, 2011

Using Goliath to Integrate With External Services

In this post I am going to explore how you can use Goliath, an asynchronous Ruby framework to enable browser integration for services that lack JSONP api's as well as how to integrate these external services with your existing synchronous web stack.

In a previous post I talked about JSONP and how you can add support for it to your Rails app. But what if you want to add JSONP to someone else's API? Chances are you can't. However all hope is not lost, we can use a asynchronous framework such as goliath to proxy the requests from an api that does not support JSONP. As you can see here we simply proxy the request using em-http-request:

Although this code doesn't look it thanks to em-synchrony it is asynchronous. The http request to get the external resource will not block the handling of further incoming requests as it would in a synchronous framework. It is important to note that calling to external services in your synchronous service is consider a scalability anti-pattern. By calling to external services inside a synchronous web request you are blocking the handling of further requests until the external service responds. If the external service becomes slow or you experience a burst in traffic to an action with an external call you will be serving 503's before you know it.

As you can see proxying external resources for use in the browser is very straight forward. However what if we actually need our existing synchronous Rails stack to be involved in the request? For example lets say that we want to re-use an existing partial to render the JSON that returns from an external service?

I wrote the forwarding support middleware to achieve this end. You can see the source here:

It takes the JSON that comes back from the external API and appends it to the forward_to url and sends the request on. The request flow would look something like this:

I think this is a really effective pattern to integrate a existing synchronous web stack with a asynchronous service. The asynchronous service is incredibly generic, it does one thing and does it very well. Its stateless nature makes it highly scalable. It also enables you to still build rich users experiences and integrate with external services while avoiding blocking requests in your synchronous web stack.

You can check out the complete source here:

Monday, March 21, 2011

Adding JSONP support to your Rails app

There are some instances where you may want to expose your JSON web services on other domains. However the same origin policy restricts javascript in the browser so that it can only access resources from the same domain as the current page. JSONP (JSON with padding) is a way to get around the same origin policy in browsers and access resources on another domain. JSONP does this by injecting a script tag into the dom, since the script tag is not restricted by the same origin policy.

JQuery has built in support for JSONP, simply appending a query parameter of callback=? will allow us to use jquery to access a JSON resource on another domain:

Using getJSON with 'callback=?' will create a javascript tag and insert it into the dom:

Adding support for JSONP to Rails app is very straight forward. As you can see here:

In the case above this controller would generate the following response:

When this script evaluates it results in the the JSONP script tag being removed from the dom and our getJSON callback being called with the data.

JSONP is a pretty neat approach to expose your services client side across domains. However given the use of the script tag it does present a non trivial security vulnerability for the site using it.

Wednesday, March 9, 2011

Optimizing your Rails site with HTTP Caching

HTTP caching is a critical component of the web, it serves as a mechanism to eliminate the need to send requests, or send full responses. It is equipped with robust expiry mechanisms and has strong industry support. Given this you would expect it to be a common part of any web developers repertoire. However far too often dynamic web sites only consider HTTP caching after they have exhausted other caching options, often having had degraded service or outages in the mean time. In this post I will show you how straight forward it is to add HTTP caching to your Rails app, as well as outline some of the advantages and disadvantages of doing so.

So as an example lets say we have a World of Warcraft site, on this site we display information about items:

The items controller backing this page could look something like this:

This use of expires_in is equivalent to: header['Cache-Control'] = "max-age=10800, public, max-stale=0". This header sets three things: how long the cache is valid for, max-age of 3 hours (10800 seconds), that this page can be cached (public), and that max-stale period for a client to cache this document is zero seconds. When we put a reverse proxy cache like varnish between our rails servers and the internet we populate the cache for a page when any user views it.

Here you can see Anne requests /items/1 populating varnishes cache subsequent requests that come before the cache expiry will be served from varnish and not our rails processes.

By serving these pages out of a reverse proxy like varnish instead of rails we increase both our Apdex score and fault tolerance as our rails processes can go down and we can continue serving requests from our reverse proxy.

Using max-age works well for something like a warcraft item page, as warcraft items do not update frequently and even if they are a few hours of out date it's not the end of the world(of warcraft). However what if we wanted to add some dynamic content like comments to our item pages? One option to keep our pages cacheable yet provide dynamic content is to use Segmentation by Freshness.

As you can see here we use progressive enhancement to render the comments inline on the page if javascript is supported. The comments controller backing this would look something like:

Here we are doing a "Conditional Get" with the rails stale? method. If the client has previously seen this version of the comment page we return a 304 Not Modified. Otherwise we render the document for them and append a ETag header. You can also see that we add in a must-revalidate directive to the cache, this instructs the client that it must revalidate the cache after it expires. The difference between must-revalidate and max-stale is that if a client cannot reach the server and the max-stale header is set the cache will return the document with a warning header set. Here is what the flow for caching of the items comment page would look like:

Here you can see Anne does a get on /items/1/comments and populates the proxy cache.

Subsequent requests can use If-None-Match to do a conditional get. Alternatively to ETag one can also make use of the Last-Modified header to cache dynamic content.

When using a reverse proxy like varnish there is some important configuration you will need to do. Here is an example with explanations of a varnish configuration that you could use for our world of warcraft site:

Overall this approach provides us several benefits I would like to emphasize:
Fault tolerance: A reverse proxy such as varnish can continue to serve requests should our app servers stop responding.
Higher Apdex score: As pages will typically be served by varnish your users will be much happier with faster page load times.
Testing: Given their stateless and declarative nature unit testing HTTP headers is generally easier then testing other caching approaches.
Performance: Reverse proxies are VERY fast, it is fair to say that you will get very good milage from HTTP caching with a reverse proxy.
Cost: There is a cost albeit small associated with every page our services generate. Why pay for servers to keep rendering the same response?

HTTP caching is not for everyone or every problem. For example if you deal with sensitive information you would not want to put that sensitive information into a shared cache. If your pages already contain user specific data you will need to apply an approach like segmentation by freshness before you will be able to cache your pages. If you are serving data that is very dynamic and user centric HTTP caching may not also be suitable for you. However if you consider using HTTP caching in earnest you may be surprised by how you can use it.

Thursday, April 30, 2009

I don't think that word (web) means what you think it means

I recently watched a Q and A session with Tim Bray on the future of the web from QCon San Francisco 2008. I enjoyed the talk, if you have not seen it already it is well worth a view. One of the responses to the talk by a flex architect and consultant Yakov Fain raised some interest points that I would like to discuss.

"you can create RIA that support Back button - just decide what application view (no page) to show when the user hits the Back button."

A disadvantage of RIA's (Rich Internet Applications) is their inter operation with the web (lack there of), you lose a lot of the shared semantic meaning of the web in RIA's. The site Mr. Fain gave actually provides a excellent example of this weakness. You achieve back button functionality by leveraging url fragment identifiers, however in doing so you are hijacking the agreed semantic meaning of fragment identifiers. "For HTML, the fragment ID is an SGML ID of an element within the HTML object." Fragment identifiers are meant to identify elements within a document and not separate documents which is what uses them for.

A direct result of abandoning this agreed semantic meaning is that the content on those "views" will not be indexed by search engines like Google. You can demonstrate this to yourself very easily, lets take: the mercedes couple cl550 as an example.

A search on google for "mercedes CL550 2009 coupe" does not return a single result to your RIA or even a Mercedes site in the first 5 pages (I didn't bother checking results after the 5th page).

Now compare that to a search for "honda civic 2009 coupe" the first result is Honda's civic coupe page.

The RIA approach of utilizing fragment identifiers is inherently limited and this becomes obvious when contrasted to a more RESTful approach that doesn't hijack the agreed semantics of urls.

Mr Fain goes on to say:

"There’s a bunch of interdependent rules for each car model that should enable/disable UI controls depending on the user’s selection. For example, if you ordered white leather interior, you can’t have yellow exterior. We’ve implemented the entire rule engine on the client, which makes the entire system a lot more responsive."

Many other e-commerce and product sites have similar rules around products, however they do not give up being indexed by a search engine for responsiveness. For me as a online consumer, responsiveness has not been a pain point or something I long for when using sites like newegg, amazon, or apple. However if their content was no longer indexable by google I would feel that pain and would be less likely to purchase from them. It seems like a pretty shitty trade off doesn't it?

Mr. Fain goes on to attack PHP and Rails via the twitter straw man:
"Mr. Bray believes that the direction of Web applications is moving form J2EE to PHP and Rails. He didn’t make it clear for what kind of applications though. Is he talking about applications like Twitter that can go down several times a day and people will keep using it because they don’t have any better choice? ... But if you take any application that handles your money (Banking, eCommerce, auctions) I’d rather stick to tried and true J2EE on the server"

I think one should be careful in stereotyping a language/platform by the performance problems of one application, you may find yourself mistaken more often then not. is a great example of a Rails e-commerce. Mr Fain seems to forget that one of the big advantages of the web is that I can use whatever I want on my server and my clients don't care.

"I believe that Web moves to a VM-based stateful clients with fast communication lines between the client and the server moving tons of strongly-typed data back and forth."

I disagree with this statement and here's why: as my very knowledgeable colleague Jim Webber put it: the web built on three principles that conventional thinking teaches are bad, it is dynamic/late bound, text based, and is built on polling. These are not weakness of the web rather they are its strengths. You should bear in mind the web is not some system that we were just handed and have to live with, rather the web is what resulted from a form of technological natural selection; it obliterated other more statically typed protocols like DCOM, CORBA and Gopher, not vice versa.

Monday, April 20, 2009

What are your wastes?

The Toyota production system starts with a conversation between Norman Bodex and Taiichi Ohno. Mr Bodex asks Mr Ohno where Toyota is today, by now they must have reduced all work-in-process inventory - enabling them to chip away at all the problems. "What is Toyota doing now?" he asked. Taiichi Ohno's reply is simple but brilliant. "All we are doing is looking at the time line, from the moment the customer gives us an order to the point when we collect the cash. And we are reducing that time line by removing the non-value-added wastes."
For the sake of this post I would translate this picture to the world of software by replacing order with "Feature Conceived". Although I think you could make compelling arguments to expand the time line to a earlier point in time such as "customer demand".

Fundamentally the Toyota production system is based on the elimination of waste, and In my experience the most productive teams I have been on followed this practice of constantly removing waste. Becoming such a team is not a destination but a constant journey, as Kent Beck says in xp explained; "Perfect is a verb, not a adjective". A helpful tool to enable this continuous improvement is the 5 why's, for example:

Why did the server go down?
The wrong privileges were set when the application was deployed.
Why were the wrong privileges set?
The wrong account was used to push to production.
Why was the wrong account used?
Bob was sick so Fred pushed the deploy.
Why does Bob's account have to push the build?
He has always done the deployments.
Why has Bob always done the deployments?
No one ever took the time to automate the deploy to production.

The five why's are by no means perfect, Wikipedia provides a list of good criticisms of the methodology. I have seen many of the anti-pattern's they describe occur, I have also seen a team apply techniques like the five why's to a point where we got our release cycle down from two-four weeks to one day. Retrospectives can also be helpful in identifying and eliminating waste, however again it is important to stress there are no silver bullets and you should not limit your improvements to scheduled meetings. Keep in mind: "It is said that improvement is eternal and infinite." - Taiichi Ohno

There are issues in drawing parallels between software and manufacturing; one is a far more creative process then the other, however the advantages that result from eliminating wastes can be realized for both. One should bear in mind that Just-in-time was far more heavily influenced by American supermarkets then Automotive manufactures. That being said as software and manufacturing are analogous, so the sources of/kinds of waste may differ between the two. What would you identify as the wastes in your organization? There are some common places you can look; Do you rely on manual testing? Are there manual steps in your deployment? Do you spend a lot of time merging code branches? Do you develop large feature sets to find they are unused? How long does your code sit in source control before it goes out to production? Do you have fat requirement documents that quickly go out of date?

Think you have no wastes? Think you can't get working software into the hands of your users any faster? If so let me leave you with this quote:

"No one has more trouble than one who says that he has no trouble." -Taiichi Ohno

Wednesday, April 15, 2009

Magellan 0.1.3 Gem Released

So what is magellan and what does it do? Magellan is a web testing tool that embraces the discoverable nature of the web.

What does that mean practically? Simply put it is a web crawler written in ruby that has 2 rake tasks built around it:

The first task will explore sites by following //script[@src] //img[@srg] and //a[@href] tags and look for documents that return http status codes of 4** or 5**. The second task lets you specify a url pattern and an expected link to look for if the current url matches that pattern. For example you can say product pages should contain a link to /sizing.html or that all pages should contain a link to /about_us.html.

Can magellan help you?

I see magellan being able to help two groups of people, those whom have low test coverage and would like an easy way to get started in testing their web application.

The second group I see magellan being able to help is those moving towards/practicing continuous deployment. Magellan can supplement your existing tests/continuous integration process with exploratory testing to find any broken links/missing documents, or verify the interconnectedness of your resources.

How does magellan replace selenium or watir?
Magellan is not meant to eliminate the need for higher level acceptance tests. Frameworks like selenium or watir will remain a key part of any healthy suite of tests. However any browser based testing framework will involve more moving parts then may be necessary to test part of a web application. As a result of this theses tests will always be slower and have more potential points of failure than alternatives without them. Magellan will let you focus your higher level acceptance testing on the key parts of the business and the integration of your javascript with the browser.

Interested in giving it a go?

Because magellan leverages the agreed semantics of the web to crawl your site, getting started with it could not be easier. You can find install instructions and examples at: github

Your feedback is welcome at: rubyforge