Yoan Blanc’s weblog

Another lost swiss guy

April 2010

Going Async

Yoan BlancSat 24 April 2010, , , , , , , ,

I’ll talk about some back-end stuff for a change, showing what can kill your server especially when it comes to external data (aka I/O) and hopefully how solutions are being found.

Let’s take that simple piece of PHP code :

<?php
$req = "http://pipes.yahoo.com/pipes/pipe.run?_id=…&_render=php";
$data = unserialize(file_get_contents($req));

It grabs some external content and might take some time to achieve this depending of Yahoo! Pipes. This is where the problem starts to itch. When the server gonna call that page it’ll do mostly nothing but waiting for the external data to arrive.

Using Apache, a process or a thread is spawned, waiting on external data and, more importantly, consuming memory and CPU by just waiting.

Going asynchronous would mean being able to do other stuff, answer to other requests while waiting.

Unfortunately, I don’t know any solutions for that problem in PHP. PHP is designed to be executed as fast as possible and not for doing heavy operations. Big websites that are using PHP use it in general as pure front-end language (for example, the before-Google YouTube).

They are a couple of solutions to go async. Some come from a specific language that is asynchronous by design, like JavaScript (with node.js) or Erlang (to name a few). Ruby can achieve this by using EventMachine and Python (which I explored more) has a couple of solutions (Twisted, Tornado (coming from FriendFeed, bought by Facebook), Eventlet, Gevent, …). Take this dummy WSGI code:

from time import sleep

def application(environ, start_response):
 start_response("200 OK",
		[("Content-Type", "text/plain")])
 sleep(1)
 return ["Hello, world!"]

The external stuff you’re waiting on is simulated by a sleep, which makes it easy to understand. Up to you to fetch external data that will take a fixed time to arrive (which is the same).

When you run this particular code, a process/thread will be dedicated for that during one second blocking other incoming ones, right? You’ll need as many workers/threads/processes as requests to handle them all in one second. Suboptimal since those particular processes aren’t doing anything special.

Take a look at some tries I made.

Eventlet and Gevent, the more convenient to use imho, will run each incoming requests into a coroutine (like a sub process) enabling the main process to do something else when some time is available. The only thing to change in the code above is where the sleep function comes from.

from eventlet import sleep
from gevent import sleep

And, of course, to run it using the appropriate server (the one from the library, Gunicorn or Spawning (Eventlet only)). Those libraries are doing monkeypatching so no code has to be modified in order to become asynchronous. I.e. using urllib.urlopen to fetch external data will not block the whole process. No code changes are required.

Twisted or Tornado, for example, are using a reactor model, which forces you to handle asynchronous code with callbacks. It’s closer to the metal but the learning curve might gives you some headaches. I do love Twisted, but it’s sometimes just too much, really.

Python has another project, running on top of Python Stackless, called Syncless dedicated to async code. It looks quite promising.

Ruby 1.9 will get coroutines under the name of Fibers, until that EventMachine seems the way to go. It looks like super clean Twisted too me. I hope not hurting too much feeling by saying that. (BTW, there is a great article from the SuperFeedr guys: Ruby Fibers may confuse)

One very interesting project, and very young as well, is node.js. Basically server-side JavaScript. This one is asynchronous by design and totally awesome if you love JavaScript or coming out of hell if you don’t like it. I do like it.

With the web evolving the way it does, where data is coming from multiple sources, heavy or special tasks are delegated to specialized units (SOA), I clearly see this asynchronous idea as an ongoing paradigm shift. Today’s bottleneck is the I/O, most often represented by the database.

Don’t forget that the backend performances aim only to serve more people, faster but 80-90 % of the time is spent on the client side. I’m a frontend guy after all.

Pour changer, je vais parler de backend, de ce qui cause des ennuis et malmène un serveur et les solutions qui sont explorées.

Prenons un code assez simple, qui récupère des données externes, par exemple un pipe Yahoo! ou un flux RSS. Dans la majorité des cas, le temps passé à attendre ces données externes est perdu, ne pouvant être réutiliser pour en faire autre chose, comme par exemple en PHP où, à ma connaissance, aucune solution n’existe.

L’idée de base est de, pendant ce temps d’attente, mettre ce bout de code de côté et en prendre un autre. Ce qui permet pendant que la requête d’Alice est en cours, de s’occuper de celle de Bob. C’est ce qu’on entend par asynchronicité, un système dit non-bloquant où certaines opérations (liées à l’environnement externe) ne bloque pas tout le processus.

Diverses solutions existent, dans différents langages de programmation. Certains sont de part leurs concepts déjà capables de fonctionner ainsi, comme Erlang ou JavaScript. Python et Ruby offrent des moyens d’y arriver également. Jetez simplement un œil aux bouts de code suivant.

Ce sont tous des applications prenant une seconde pour s’exécuter mais qui, au sein d’une seul processus, peuvent fonctionner de manière non-bloquante et parallèle (sans thread). Une comme dix ou vingt requête en parallèle prendront 1 seconde alors que la manière naïve demanderait autant de processus, threads ou autre que de requêtes faites. Les gains de mémoire vive et de processeur sont drastiques.

Comme vous pouvez le voir, certaines solutions semblent plus élégantes, plus simples. Gevent et Eventlet, par exemple, utilisent les coroutines et viennent modifier les éléments existants, bloquants, par une version tirant partie des coroutines, donc non-bloquante.

Selon moi, il y a ici un réel changement dans la manière d’aborder le développement web aujourd’hui. Les applications se distribuent de plus en plus, déléguant certaines fonctionnalités à des entitées plus spécialisées (SOA), voire même, à d’autres services. À l’heure actuelle ça reste souvant les interactions avec la base de donnée qui requièrent le plus de temps.

En tant que développeur frontend, je ne peux m’empêcher de rabâcher que s’il y a là des pistes intéressantes afin de servir plus de visiteurs, plus rapidement avec le même serveur, de 80 à 90% du temps est occupé côté client.

About

meYoan Blanc is a web developer that lives in Norway (Markveien 24, 0554 Oslo) works for Opera and comes from La Chaux-de-Fonds. This web site is for this weblog, a playground for random stuffs and can help keeping me connected with some friends out there.

Get my vCard or contact me by phone (skype:yoan.blanc) or email ().

Misc

RSS, list.blogug.ch

This site reflects only my opinion and is not affiliated with anyone else.

copyright 2006-2009 — doSimple.ch