Facebook@NUS: Dealing with the "Happy" Problem

Tuesday, March 17, 2009

Dealing with the "Happy" Problem

Today's class was on the issues associated with scaling applications for huge workloads (million-eyeball range).

This is not a random class. This was a class that I added in the middle of the semester last year after I watched Caleb blow up on FarmWars and realized how difficult it was to debug load problems for Facebook applications.

The problem with Facebook applications crashing under high loads is that you cannot predict when it will happen and it's extremely difficult to reproduce the problem even know how to begin to fix.

While the session today might seem very technical and some folks might be a little lost, I would like to emphasize that people are not expected to become experts at dealing with these load issues. Minimally, I would like everyone to have an appreciation for what might be the cause of load problems and basically, many things can go wrong.

The interesting thing is that most times, it's not a hardware issue, i.e. ít's not true that the hardware isn't running fast enough, but rather, it's due to a host of factors.

Minimally, I would like the programmer types and those of you who are setting up and running your apache webservers to have a appreciation for the complexities in tuning the parameters for the webserver and understanding some simple concepts about swapping to disk and the use of processes.

Perhaps some of you might not have understood some of what was said, but perhaps some seeds of understanding would have been planted and if you end up creating a million-eyeball app that blows up royally, you will have some inkling as to what to do.

Chris Henry would be holding a workshop on Web Performance this Saturday. After that workshop, you should not only have a high-level appreciation for web performance, but understand how you can try to improve the performance of your webpages.

Chris is likely to also try to peddle his Honours Year Project on you. He is currently developing a measurement system to passively monitor Facebook app performance. If your app does have the misfortune of blowing up (or some would say the fortune of being unexpectedly popular), his measurement system is supposed to be able to help pinpoint when things blow up and allow you to see the interactions between the client and your webserver so as to help you pinpoint the reason for the problem.

Chris needs real apps with problems to prove that his system works. :-) If it works, it will be win-win for everyone. :-P

I have put up with latecoming for many weeks already and I am quite resigned to students coming late. Next week is however a very special lesson. I have managed to persuade a whole bunch of local entrepreneurs to come down to share their experiences with you.

From last year's experience, this is a very interesting session - and it will be held in LT19. Please come on time for next week can? :-)

Note about Final Project Prototype due this Friday:

I am really not being evil by making you guys finish your prototypes early. You can ask Kok Wee about the consequences of leaving your CS3216 Final Project to the last two weeks.

But more seriously, there are two reasons why this Friday:

So that you dun have a lot of time to put in too many features. It doesn't take a whole lot of features to make a sucessful Facebook application. A small limited set of features can succeed if executed well. By not giving you guys too much time, I force you to decide which one or two features of your app are most important.

So that you have the chance to get user feedback. One key attribute that differentiates social networking apps from regular webapps is that you can get feedback from and interact with your users. You need time to let your users use you app so that you can experience interacting with them and refining your app based on their feedback.

All in all, there's lotsa sense in the madness lah. Let's all chiong the next couple of days and let's see some action on Friday! :-)