And there I was, happy as a 4-year old who had just found a whole box of strawberries.
Then all of a sudden, along came a post about how Amir Salihefendic had made a little daemon in C, using libevent and memcached, that would allow him to cache a http request in memcached. I don’t quite get why one would like to reinvent a reverse http proxy like that, and stated my arguments on why I thought that this seemed like a really bad idea.
As you can tell from the comments, he did not agree. It ended up with him deleting my last comment. Good thing I kept a copy in my clipboard. So, here are the comments.
Mads Sülau Jørgensen 25. Mar
This could just be me, but why would you do this over a proxy setup with Varnish, Squid og Apache with mpm_event and mod_cache? Seems a little like you’ve reinvented the wheel.
Also, as a previous commenter has suggested, why did you not just use the memcached support in nginx, and if it was because of the lack of multiple memcache servers, why not patch it and send this patch back to the community?
amix 25. Mar
Mads Sülau Jørgensen:
There are two reasons why I didn’t patch nginx with this support:
Separation of concerns: Running a separate polling server architecture I can monitor it and scale it independently Patching nginx would not have been that easy I prefer using memcached than Varnish or other caching options and memcached gives me a lot of options (deletion of multiple keys at once, consistent hashing and easy and proven scaling to terabytes of cache – - just to name a few).
Mads Sülau Jørgensen 25. Mar
Ah, but you could, easily, monitor a running nginx instance. The same goes for a running varnish instance. I’m not necessarily talking about running the nginx memcached module in the same process or the same server as your main proxy is running. So you’d archive the seperation by running another process or running the process on another box. Easy.
And the memcached module of nginx does not seem /that/ difficult to work with. Sure you’d have to do alot of handling of dead servers, and timeouts that could block the whole server. The current one probably handles this already. I don’t use it, so I’m not one to say.
With your keys structured correctly, you can flush (purge) multiple keys easily in varnish. You can provide your own hashing function as well. You could also use nginx’s hash balancer to send the request to the “correct” backend server.
The thing about hacking some server together with libevent in a language, that might not be onces main language is, that you’ll end up missing a lot of things, and it may end up exploding in your face. From the top of my head, what happens if a memcached server dies, memory leaks, what happens if the process segfaults (see memory leak).
It might be super-duper fast, but is it stable :)
amix 25. Mar
Mads Sülau Jørgensen:
plurk@web04:~$ ps aux | grep poll_server plurk 29835 2.3 0.0 2376 1048 ? S Mar24 27:23 /home/plurk/poll_server/poll_server 0.0.0.0 16000
It’s pretty stable (has not crashed since I started it), uses very little CPU and very little memory. If I stop the server, the nginx server will just route everything to Python servers. I doubt Varnish or patching Nginx would have been faster or more stable – - and they would have added a big conceptual complexity.
Personally, I would much rather fix errors in a 100 line program than a thousands line program. It’s separation of concerns and separation of complexity. So sure, I could have extended nginx+memcached or hacked Varnish – - but I chose to keep it simple, learn some new things and solve the problem in a very specialized way.
Mads Sülau Jørgensen 25. Mar
There is no hacking in the varnish solution what so ever. Varnish is a http proxy cache, it would be doing what is was designed to do. And my guess is, that it would be doing it well. My guess is also, that you’d end up using it alot, and to do more that using it with this one goal.
With a specialized solution, you have to handle a new process, you have to handle the errors that process can give, and you have to keep it running.
Using the tools already available makes more sense to me, and that’s what I’m doing when I run into a problem. Others, more proficient programmers usually have had the same problem before me.
amix 25. Mar
Mads Sülau Jørgensen:
Varnish is a good HTTP proxy and cache, no doubt about that, but it lacks a lot of things to be used in my setup. For example, how can I invalidate 1000 keys at once and do it in a split of a second? Their purging capabilities seem to be pretty primitive…
Plus, who said that I won’t iterate over this and go into a new direction – for example, benchmark this against a long-polling “comet” solution. Going into this direction would mean dropping Varnish – while I only need to change and add some lines of code to create a long polling solution of this.
Mads Sülau Jørgensen 25. Mar
Well, with a properly designed URL schema, you’d do something like this:
PURGE /poll/user/(?:mads|john|abc*)+/status
And then you’d have the VCL to handle that purge via purge_url(). The above will then purge the urls matching the regular expression, (so mads or john or starting with abc).
It actually works both fast, and well. It does not purge the cache when you tell it, but when you request an object. That makes invalidation fast.
See http://varnish.projects.linpro.no/ for more information.
You can more or less configure varnish to do what you want it to do via it’s configuration file.
As for the comet solution, your post more or less said that. Right about where it said something about push being hard to scale, which strikes me as a very odd comment. With push, you just have to be able to handle a horde of mostly idle connections, for which http://code.google.com/p/erlycomet/ or something like that, would probably be the way to go.
amix 25. Mar
Mads Sülau Jørgensen:
I am pretty tired of keep repeating myself. So this will be the last comment I answer you.
Varnish is a good cache, but it’s not really suited where I am headed. An example, how would you change the Varnish solution to support long polling? I can explore this option pretty easily with a basic understanding of how to build a custom server that can perform really good and that can be customized for my needs. I personally want to learn how to build things myself (also things that are complex and that require some thinking and engineering). I know I’ll need this skill longer down the road and practice makes a master – so the more practice I have, the better I’ll be at doing this.
this is the reply he deleted
> I am pretty tired of keep repeating myself. So this will be the last comment I answer you.
The consider this answer a bonus for your collection of entropy.
> An example, how would you change the Varnish solution to support long polling?
I’m quite sure that I never implied, that varnish would be suitable to do long polling (or other non-cachable long lived requests). If you think I did, please, tell me where.
> I can explore this option pretty easily with a basic understanding of how to build a custom server that can perform really good and that can be customized for my needs.
Yes, and that’s very fine and good, but that’s not really what your post says. It basiclly says “Using libmemcached and a C program, I can make a lot of http requests. Fast. And I don’t expect to use comet. Push is not what I want.” but from the looks of it, you may have wanted it to say “Hi, I made this little C program using libmemcached that I’m planing on developing into a comet server at some point. I need to extend the server with push in the future.”
> I personally want to learn how to build things myself (also things that are complex and that require some thinking and engineering).
Yes, throwing something together is very fun, throwing something together in C is even more fun. But throwing something together in C for a large website, because, it’s just fun, well, for a lot of the web developers in the world, that’s a really bad idea. It may work today, tomorrow but will it work in a week, a month or a year?
> I know I’ll need this skill longer down the road and practice makes a master – so the more practice I have, the better I’ll be at doing this.
Then I’d really recommend you looking at this:
http://aleccolocco.blogspot.com/2008/10/gazillion-user-comet-server-with.html
A comet server in C, with all the boiler plate completed. Just insert logic.
You might also get a kick out of reading up on memory management, input validation, how to avoid buffer overflows and just plain old K&R C.
I don’t really think you added anything new to the discussion. And I don’t really think you understood much of what I had written back to you. The blog link you link to is a joke (have you at all looked at the code? it’s pretty far from a comet solution, it basically just opens connections…)
I.e. your last comment didn’t add anything new to the discussion.
I understood plenty of what you wrote. I can read you know. Perhaps you should read my comments and your replies again, or ask an unbiased part for an opinion. Either way, you don’t win an argument by silencing the other part. Ever.
It took me some time to write those comments to you, in the hope that I’d actually educate you for the better, the last thing you can do, is not delete them.
The thing is, that you keep telling me, varnish can’t do X and Y, and I tell you that, yes, it can in fact do that. It was built to do that. Then you tell me, but it can’t do Z, then I tell you, yes, it was built to do that.
Then you go on to say, that varnish is not a space shuttle, which I never said it was.
Now to get back to your original solution. What is the difference between that and a reverse caching http proxy – like varnish or squid?
Nginx+Memcached could also have solved this problem partially as well – or any other caching option. The purpose of this hacking is to do a custom made polling server and re-iterate on it – to scale it to thousands/millions of online users. And to learn something in between. I did not say it was a new way of doing things or that there aren’t other options of doing caching/polling whatever.
This will be my last comment as I am tired of wasting time with a troll.
PS: I am already well educated and I don’t need unknown developers to educate me or prove me wrong. Go bash on reddit with your “insightful” comments, while I do my stuff.
Adios.
I just don’t understand why anyone would try to re-invent just that, when there are so many options that would do the job way better.
I’m not really the one trolling here, you are. You keep failing to reply to the facts I tell you. And now you are name calling. Not a very good argument.
If you don’t want other peoples opinions, why on earth did you post it on the internet?
You might be better educated than I am, but that does not make you right.