Sunday, June 2, 2013

10 Lessons from a year of development in Ruby on Rails (Hint: performance is important and achievable)

I've been working on a little thing called MetaBright for the last year, with a production site live for the last 5 months. Accepted wisdom these days is that performance and scalability don't matter for young startups, something I initially subscribed to. But my experience has been the opposite - I've found they do matter. They matter a lot. I didn't know this at this start (a claim some of our users will eagerly back up, especially if they're in Germany). But I know it now, and I want to spread the gospel.

Below is my collected wisdom (read: opinions) about how/why you need to get your page loads below 1 second. It originally started with a focus on the How, but the Who and Why sort of took over. I get going on these rants, you know? The details are Rails specific, but really all of it is universally applicable. Hope you enjoy. Please don't flame me.

Note: I know performance and scalability aren't the same thing, but when you're relatively small they are. You have to provide low response times but you also need to keep your overhead low, which means building something that is both performant and cheaply scalable. Scalable in the sense that it you can throw more servers at the problem with the same response time is a very expensive definition of the word scalable. So I'll use them interchangeably for the rest of this post.

Edit: As some have argued, my explanation for why I use scalability and performance interchangeably is still not technically valid - a better way to put it is that an architecture that can expand horizontally with traffic growth may be scalable, but it may not be sustainable financially. I've adjusted the post title and language to reflect that.

"Yeah, well, you know, that's just, like, your opinion, man."


TL;DR

Who should care more about performance: 
First time developers, entrepreneurs, and every Product Manager on the planet

Why should we care:
Damnit, the internet should load faster. You're not exempt just because you're a startup.

How do we build things that go faster, cheaper:
Watch Railscasts, put static assets anywhere else but your own server, use the asset pipeline, turbolinks, eager loading associations, be careful with often used helper methods, don't rely on ajax, make your front-end friends heroes not foes, memcached and dalli, and performance profiling


And so we begin...

Who should care about performance?


This is primarily for folks fresh out of college or just doing your first startup. That Advanced Data Structures and Algorithms class you took will serve you well at Google/Microsoft where you are part of a greater machine with a huge, well stocked support system where you can really focus on something. 

But when you are on your own or on a small team, you have to know every piece end to end. The piece you aren't aware of is the one that will burst into flames when someone actually does manage to stumble onto your site. Don't be that site!

Also, tacked on at the end is a little rant about why Product Managers should be deeply involved in this kind of stuff (from the perspective of a former PM). Read if you're feeling angsty.



Why? We get it already, performance is important blah blah blah


In the old world where site traffic would (hopefully) mimic a hockey stick, not worrying about performance probably made more sense. Now that social sharing is the origin point of virtually all traffic, hockey sticks are a joke. A cliff is a more apt description. Don't believe me? Wait till you have one good post on Reddit and you spend the night crying yourself to sleep as load times go into double digit hell no matter how many servers you throw at it. Pat yourself on the back buddy, all those early adopters that could have been excited about your new service and been great advocates on your behalf are now frustrated that you wasted precious minutes of their lives. What's that about scalability not being important? I can't hear you over the sound of all those timeouts.

Second, page load time is a feature as much as anything is a feature. If you want to maximize the utility a user gets out of your service, what you're really saying is, "I want to maximize the number of times the user does certain actions." Your site does not begin to approach the most important thing your user has to do today, so they have a limited amount of time to spend with you. The longer it takes to do each action you want them to do, the fewer actions they will take. Screw that other feature you were going to build to coax them to do more, get your page load down. That's fucking science, dude.

How to get to page load heaven


Disclaimer: Let me start off by saying that I, and MetaBright (I built it, I'm gonna pimp it), have a long way to go. For too long we built for today, thinking tomorrow may not come. There's no guarantee that it will, but at least we'll be moving in the right direction when it does.

This should be a fun process. If it's not fun, you're probably over thinking things or concentrating in the wrong areas. Not everything needs to be tweaked to perfection, especially if it makes the code hard to maintain.

So, in order of easiest/simplest to most important/complex:

1. Get a Railscast Pro account. Watch everything. All of it. Why Pro, and not just the free stuff? The revised episodes are the best (and available only to Pro viewers). Ten bucks a month is an incredibly cheap price to pay to be a better developer. Ryan Bates deserves to be in some sort of hall of fame.

2. Get your static assets out off your site. That's an unforced error. Every millisecond your server spends serving up JPEGs of cats is one more it could have spent doing something else. We use S3. Make this as simple or as complex as you want to, but just do it. 

But remember that getting them off your own server is only half of it - you have to remember to set Cache-Control and Expires headers so that your images are cached properly. Don't skip this step - its easy and does wonders for the speed at which your user actually sees the page finish loading. Theres no point getting your response time under a second if everything takes 3 seconds to render. We're still working on this, as we have an old configuration of paper_clip that could needs tuning, but even the small progress we have made has had a huge impact.

3. Use the asset pipeline. This is obvious to most, but given the number of people still using Rails 2, I suspect the gospel still has some converts left to find. I've written about this before if you want to see some before / after action.

4. Turbolinks. I know, not everyone likes it. I like it because the lack of a real page load makes things instantly "app-ier", and because it makes site slowness extremely obvious and awkward. Things feel broken if there's a long response time, so it will force you to get your shit together. And not just on a few key pages - everywhere. I'll admit that I tried it, thought it was awesome and the all-solution, but then when the awkward "is it loading or broken?" started happening too much I pulled it back until we were and better footing elsewhere. So if you're unhappy with it, ask yourself: is Turbolinks janky or does my code need some work?

5. Eager load associations. This is big, maybe the most biggest aside from the asset pipeline. Rails doesn't know what you intend to do with an ActiveRecord relation. YOU DO! Load the stuff you need the first time, and don't load what you don't need. It's that simple. It's had such a positive impact that I'm working on a little gem that will force you to add a select statement to all queries. That may be going too far, but I think it deserves a shot.

6. Helpers methods are not always good. Rails is beautiful to read, fun to write. But! Doing things "the rails way" can be too seductive sometimes. A good example is "link_to". You get so into Rails that writing anything that even looks like HTML feels dirty. You know how much time you save writing link_to saves you? Virtually zero. And if you have hundreds of links piling up, say in a newsfeed, your pretty code will cost you. Not huge, but it adds up, and it's something you can cut out without any real effort.

7. Post page-load loading of hidden views (like dropdowns, modals, etc) by ajax is not a real solution. I'm honestly not sure how many others have fallen into this trap, but I did pretty hard. It can feel like such an ingenious solution at the time, and in some cases it is. But if you're doing three or four, remember that when things get hot, instead of making one request/page/user, you now have 4, with a lot of extra complexity to boot. It can also make your logs incredibly annoying to read, as each full page request spawns much more log output then normal. Get it right the first time - and if it's really tempting to load something after the standard page load, I would question whether you can just load it on request by the user, or if you really need that little feature at all.

8. You can't write everything, and often times the most harmful code comes from a well meaning heart. Use tools/frameworks that your non-backend team can understand and teach them how to use those tools. If they are confused, it's your fault. Use things that fail fast, are easy to learn and force good habits.  Coffeescript, HAML, SASS are great for this, and whats even better is that they can be integrated slowly into your projects.

It's amazing how simply converting your code from ERB/CSS to HAML/SASS can clean things up without even trying. You can go one view at a time, maybe spend the first 30 minutes of every day converting one view. The cleansing effect is incredible. You'll feel like you quit smoking, lost ten pounds and drank some nasty shake made of spinach or something. Good stuff. Just make sure your UI folks are in the loop and understand the benefits. 

9. Use Memcached and Dalli - I'm personally not a fan of page and action caching, and apparently neither are the folks behind Rails 4. But I love model caching. It integrates so smoothly into your app with so little effort that you forget it's even there. In terms of pure response time improvement, this will without a doubt have the biggest impact. But perhaps even more importantly for bootstrappers is that it's cheap. Like dirt cheap - a gig of memcache on Heroku will only run you $70 a month.

All that being said, I'm almost hesitant to recommend this. It's just so easy to do that it's easy to abuse, and it can be a bandaid for really bad stuff. One day you'll wake up in a metaphorical (also possibly literal) ditch with code that is so hacked together that you can't work on it any more because you simply don't want to. That is the worst feeling in the world. Don't be that developer stuck in a metaphorical ditch of your own making!

10. Most importantly, before making any changes, profile a page to see how much each piece is costing you in response time. You don't need to build full out performance testing automation, which is not always a great use of time if you're constantly iterating. All you need to do is replicate production conditions to the greatest extent possible and use good gems. My personal favorites are rack-mini-profiler and sql-logging.

Everyone says you should profile before and after, so maybe you've already shrugged this off. "I'm too busy, shit's gotta get pushed now. I'll just fix it and know it got better" you say.



Dinosaurs!

Look, you should measure your baseline response times not because you may actually slow things down with your changes, which is almost never the case, but because you will get to see exactly how much better things are afterwards. That is addicting, and makes you realize just how important (AND EASY!) all this stuff really is. 

Find your own path to response time heaven, but grant yourself the satisfaction of having done it. That's a healthy feeling, and it will keep you honest later on when you really just want to ship something but know you should take a peek at how much it will hurt you. Being a developer isn't always fun - don't feel guilty about enjoying it when you can.

Rock on, and make the internet a faster place to watch cat GIFs.


JC
Co-founder, MetaBright.com

PS: Why Product Managers should care more about scalability

Before venturing into startup lala land I was a product manager at Salesforce, so I've worked on the 'dark side' as well. On most teams I observed or worked with (including my own), the PM felt they were under constant pressure to deliver features - scalability and bugs had to fight an uphill battle. Feature X, or the performance improvement your lead engineer has been saying is critically important for 6 months? Well, "performance improvement and bug fixes" can't be captured as a screenshot in your monthly review deck so we're doing feature X, Y, and Z! Let the next PM worry about it.

Not good. I think the solution is to tie employee success to key metrics, not feature delivery. But we do! you say. Metrics have their own slide on the deck and everything! If your OKR or V2MOM (or whatever inexplicable name you have for quarterly goals is) has anything but metrics on it, theres always the risk that flashy features will distract from what's important. I think you should give a PM and a team a metric to attack, and let them figure out how to get there. But also make sure they know you're happy with monthly review deck that looks like an econ grad student paper.

That's just my opinion, but you read this far down so clearly you're in a generous mood to listen to someone else's bullshit.

Take away? Your engineers know what they're talking about: performance (and quality as a whole) is a feature. 

PPS: Fads in tech advice, or really why are you still reading this?

There are a lot of well regarded talking heads in tech that say a lot of things that become canon pretty quickly. Are we all about TDD today, or we moving fast and breaking things? Hiring slow and firing fast or are we all getting into mobile-phone-based-food-truck-same-day-delivery? It's enough to leave the jaw agape. Agape I say!

I think due to the nature of the internet (the whole thing about about facilitating the exchange of data at the speed of light), the tech community tends to generate pithy, one liner wisdom at an unbelievable rate and volume. The majority of which comes from the demi-god class of those with a successful exit.

One exit does not make you Buddha. I'm not saying every word from a VC/successful entrepreneur is total shit, just that you should do your own research and remember that every startup has its own circumstances. Twitter is an especially great place to find good sounding, bad advice. Hold the finger for an extra second or two before retweeting the next algorithmically generated piece of gospel. If it goes to the tune of "X is not Y, Y is X", try flipping it back. Sounds just as mystical, right?