This is not an easy app to develop…
Not that any app is particularly easy to develop but with Sparks there were so many things that we encountered during the first version that we set out to solve in the second that I felt that it would be worth sharing the experience here for those curious and those looking to venture into building an app themselves that deals with 3rd party services.
Threads, Queues, Blocking and Blocks
One of the most difficult things to manage in Sparks is the crazy amount of network activity that is required to do even the most simple of tasks. 37Signal’s API is certainly well documented but, IMHO, a bit too restful for its own good. Architecturally, it makes it quite simple to know where to get data but unfortunately for us, most refreshes to rooms require several network requests and the initial loading of messages can take some time if you’ve got multiple sites and rooms. I tried several approaches to the chattiness of the networking code each with it’s advantages and disadvantages and in the end settled on combinations of a few of them.
Danger! It’s about to get geeky in here…
If anyone’s interested I’m more than happy to delve deep into the details on our final choices but for now, I’ll leave you with an indication of what’s happening now using my own account as a basis for which you can make some of your own comparisons.
When you first load the application, there is nothing stored about you or your habits as a user. So, we send you to 37Signals to authorize Sparks which allows us access to all your campsites. This information is very basic and doesn’t give us quite enough to show you anything yet, so we then cache some of the small stuff (API, OAuth tokens, site urls, etc) and then start the network requests.
I have 4 Campsites that I’m a member of so for me, that’s 4 requests that I need to make just to get a list of rooms. Depending on network traffic, latency, and the device I’m using this is a small start. From these 4 requests I have enough information to see the names of the rooms that belong to each site and that’s about it. So, I hang on to as many connections to the site subdomains as possible on background threads (via GCD) and start asking each room for it’s metadata, today’s transcripts and recent uploads. I have between 10-20 rooms for each site, for an average of 15 rooms and 3 requests per room for the first load. If you’re not keeping score…
4 sites, 60 rooms, and 3 requests per room for initial load for a total of 184 URL Requests in the first few moments of the app’s launch. Once we have the messages, most of them are either white noise (I don’t really care if someone entered the room 12 hours ago…) or missing information.
Missing you say? Well, not missing, per se, but RESTful so each message has a reference to a user-id but no information that may be useful for displaying the message like the user name. Since some users are not actual members but guests I can’t guarantee that a user is going to have certain data points so I have to make a request for every new user-id I find. On average I have 50 users across my sites so I’ve got to get them as soon as possible or risk delaying what I can show in the room if you happen to hop in before I’ve had a chance to cache the users. So, while I’m parsing the messages (did I mention that the Objective-C regex engine leave a bit to be desired…) I’m firing off between 30-50 new requests for user information. Other messages, like document upload messages, don’t have enough information for me to share, preview, load the resources, or even link to it reliably so I have to call another end point to get that information. This has been an average of one initial request per room’s initial load with a separate request for any new messages that happen to arrive during the use of the app.
If you’re losing count we’re near 250-300 network requests and you haven’t even joined a room yet. So you do. You “Enter” a room.
Well, this isn’t a typical UDP socket connection to stream a chat room or IM session. No, instead, to get more information and to enable https streaming over SSL in your room you MUST already be ‘in’ the room. So, one request to a DIFFERENT subdomain than the site subdomain to enter the room. Which makes reusing the network connection that is already in use on one domain difficult, if not outright impossible. So we don’t. This means, that if you’re wanting to get into a room and start streaming the existing connections cannot be reused so we have to fire up a new one and wait until we’ve got confirmation that you’re inside a room before we can stream the messages. This is not a Socket connection to a remote server. It’s a TCP connection to a standard HTTP server with the keep-alive headers on so that you can hold the connection. On a desktop app, this is usually fine, I’ve done it a million times. But on a resource constrained device with questionable network reliability you’re going to be counting bytes in packet bursts and performing health checks as often as possible without revealing the inner workings of all this data coordination to the end user. If I showed a message every time a connection was dropped or forced to reload or that your credentials needed a re-challenge you’d never stop seeing notifications.
That’s not to even mention the polling of rooms that you’re actively in and the possibility that a user may decided to tap the refresh button at anytime… which is actually valid given that if you have 60 requests to make to refresh your rooms it’s possible that 10 could be 2 minutes behind while 30 are up to date and 20 are in queue to be refreshed, including the only one you even care about at that moment in time.
In 10 minutes, I can reach hundreds of requests, in an hour thousands and it goes up from there. This is a problem that we solve constantly to ensure the bare minimum amount of data is being used while keeping the inner workings of what’s going on as transparent as possible. I don’t want to go into too many details but, in case you’re wondering, we’re constantly tweaking the loading/refreshing algorithms to ensure the most optimal way to refresh your data based on your usage patterns, network reliability, 3G vs WiFi connection, and so on.
ALL of this is performed away from the main thread (The UI thread). All of it. In fact, it’s on multiple threads depending on your needs and what the iPad will give me (for me that’s between 10-20 on a given full day of use), mostly moved over to GCD from NSThreads and RunLoops (which v1 relied on) which means that most of the codebase had to be re-built to take advantage of the block based async patterns that are favored by GCD (and myself!). Even though the heavy lifting is done off the main thread, the sheer volume of possible callbacks that can be invoked can flood the main thread with updates that then must be throttled and queued again to be performed when it won’t destroy the user experience. Can you imagine if I refreshed the lobby for every network request callback? When we first entered beta one of the feedback points was the I should never load data on the “main thread”. Yes, that’s true and I don’t. I never did. But I did have to figure out just how long to pause between actions before reloading/refreshing so that not only would the refresh happen quickly but so that indexes of arrays weren’t resorted, lost or augmented so that by the time a callback did return, the target objects still existed and existed where the callback’s lexical context expected them to be.
iPad 2 is your friend… sorry.
All this really comes down to the fact that iPad’s dual core setup is the best way to get the most out of Sparks. It can handle the load, concurrently distribute tasks and hang on to network connections more reliably than it’s predecessor ever did. Testing the app on my iPad 2 vs Jerry’s iPad 1 reminds me every day that I seriously need to buy him a new one. (I’m waiting for iPad 3. Jerry you didn’t read that!) Most of the reports of hangs or long load times are due to the fact that the system is optimized for multi-core architectures and when one is available is the performance and latency is exponentially better. There’s not much I can do or say about it. I made a choice to support the latest hardware and given Apple’s direction (or what I see as its direction) it’s a forward thinking choice that positions the application for long term flexibility which is good for us and for Sparks and ultimately good for you.
All of this for… (TL;DR;)
Chat. On the surface, it’s simple. In fact, most of our users imply that they think of chat in one of 3 ways: ( at least the ones that email us… )
- IRC style - one room, one context, dedicated socket connection UDP data that is consumed and lost after you leave.
- Instant Message style - UDP/RDP/TCP messages between one user or a few users. Simple. Even Text messages follow a simplistic User B pings User A repeat.
- Web chat (campfire, hip chat, convoke) which is a new standard but most of the low-level networking is taking place on your behalf by the browser. A unit of code that has been tested and optimized OVER AND OVER AND OVER again and it still gets it wrong. (WebSockets are nice though!)
Our view of chat is considerably more interesting, much more complex and on a device that has limited resources. We don’t own the backing service which means that we have to play by someone else’s view of the way chat should work for them. Which it does, Campfire is a great service and it’s popularity is a testament to it’s usefulness. Unfortunately that’s sometimes at odds with what would make a smooth device chat experience.
I have spent many late nights working hard to solve these problems in a way that this complexity is hidden from you so that you can work, chat, communicate on the go in a simple and reliable fashion. Sparks for iPad 2.0 is a reflection of all the thought that Jerry and I have poured into this particular domain and we believe that we’re really starting to uncover the best parts of what Campfire has to offer and are looking forward to how we can contribute to the experience and provide something that’s not just “the website on the iPad” but a valuable extension to an already fantastic collaboration system. These problems (if you can call them that) are problems that make me excited to sit down at my computer and write some code. They are challenging yet rewarding and I can only hope that the end result and the path that we’ve been on is something that others will also enjoy sharing with us.