Dubman.com/narratives

Building Walk Score's Apartment Search

In 2011, I found out through my network that a local startup I knew of and admired, Walk Score, was looking for engineers. Walk Score answered the question “How walkable is this area?” for any location in the US. It arose from a sustainable urbanist perspective that sees walking, bicycling, transit use and a higher-quality public realm as civic goods worth enabling and promoting, a view I share.

The vision was, you could post an apartment listing and brag about its high Walk Score, as a shorthand way of indicating its walking proximity to shops, restaurants, schools, and other destinations. The insight was to turn something as complex and multidimensional as walkability and turn it into a 0-100 scalar that readily maps to human experience. A Walk Score in the 90s is a walker’s paradise. If it’s in the 20s, you probably need a car.

They had built the part of this system for determining the score. What they wanted to do was demonstrate the vision, raise the bar and spur partnerships with existing players, by creating an apartment search tool on the web that featured the Walk Score data, integrated into the apartment search process. To get this bootstrapped, they needed to source data for apartments.

Time to learn Python

I had coffee with the CTO the day before interviewing with the whole team, who told me they were using JavaScript (with Node.js) and Python. As for Python, I knew how to spell it, and that was it. The startup I had co-founded years earlier, Aristotle Software, had issued a snake game called Python, before it was a language.

Python had a reputation for being easy to learn, which I figured I would put to the test. So I biked over to the UW bookstore, purchased a book on Python, read the whole thing in one go, wrote a bunch of sample code and ran it on my Mac. I came in the next morning feeling pretty bold, and decided on the spot to commit to doing the entire interview in Python. I got the job.

Wow, JavaScript is a real language

When the CTO of Walk Score first told me they were using Node.js to run JavaScript on the server-side, my initial response was, “Cool, but why do you want to?” I look back, and I think, I knew so much, and yet so little! I was quickly schooled by others on their small but exemplary dev team.

Prior to 2011, like many others at the time, I had carelessly absorbed an irrational bias about JavaScript, that it was slightly unseemly, slow, inelegant, to be used in small, ugly doses to solve some difficult, targeted client-side problem, only when necessary, like a topical cream for a infection. You would never choose to write something in JavaScript if you had a better option.

Then I read Douglas Crockford’s JavaScript: The Good Parts. I started taking JavaScript seriously, went through the entire reference manual, and realized that, despite a few idiosyncrasies (since largely addressed though ES6+), JS is actually a very expressive language with some very powerful constructs and it’s tremendous fun to work in. The mental model felt natural to me. The immediacy of the JS-browser interaction reminded me of the early days, coding closer to the metal, now with way more computing power available.

As it turns out, by leveraging the execution model of JS in the Node.js context, you could run a super scalable, high-performant API on low-cost hardware, like I always knew you should be able to. And furthermore, with a single language on the server and client, you could start creating an app-like experience on the web in a simple, elegant way, like I always wanted to. As soon as I saw how simple this all could be, my interest in the whole complex IIS / ASP.NET stack dropped to zero.

Within a few short months, I started thinking of JavaScript (now, TypeScript) as my new favorite language, with Python a close second. In the decade since, I’ve used many languages, but TypeScript and Python remain among the favorites.

Scraping Craigslist

It so happened that the co-founder of Walk Score personally knew Craig, of Craigslist fame. That may have had something to do with the fact that my primary mission turned out to be, to scrape Craigslist: Get all the apartment listings from Craigslist into a database, and then onto the web. It turns out there are actually a lot of apartment listings on Craigslist, like millions of them nationwide, and a lot of daily flux.

Web scraping sounded like fun and required mastering what I figured would be a broadly useful set of skills. That, plus some indexing and stuff, and you could build Google!

I perused the Craigslist rental listings, came up with a data model that covered all the bases, and figured out the pattern of URL’s I would have to fetch. I wrote Python code to scrape an individual rental listing, to enumerate the listings, and to fetch all the images and store them in Amazon S3 buckets. I wrote code to determine whether a listing was actually a new version of an old listing, which had to be a fuzzy match. I geocoded and indexed all this data. I wrote scripts to automate and monitor the system and made sure that data pipeline kept flowing swiftly. It needed to avoid overburdening the Craigslist service itself, which could get us noticed in a bad way.

Building the back end on AWS

I ended up allocating some 80 Ubuntu Linux servers via Amazon Web Services to keep up with the constant changes. There were on the order of 4M listings nationwide at that time in circulation, with a lot of turnover. The site we built pioneered the commute shed feature where you can set a destination (e.g. work, school), a travel mode (e.g. bicycling) and see the blob on a map where you could live and have that commute, with a list of available apartments to go with it.

My role was to provide geocoded apartment data for the map and to create a browsing experience for the apartments themselves. I generalized the scraping code to work with multiple source providers beyond Craigslist, which was just the initial stepping stone. In addition to Python and GitHub, this role introduced me to modern backend engineering with Ubuntu Linux and Node.js, as well as modern frontend engineering with an ambitious, performant single page web app written in JavaScript.

This apartment search feature was intended to help sell the service, and ultimately the company itself, and that worked out pretty well because Walk Score was purchased by Redfin in 2014, and that nifty apartment search site is still live all these years later looking pretty similar to what we built. Although I’d left to join another startup in 2012, my stint at Walk Score was profoundly formative, and I was proud of what I accomplished. I learned a ton of new skills in a short time with a fantastic, high-functioning team, and helped catalyze something with lasting impact.