Asterank Discover, crowdsourced asteroid discovery, reviews its 100,000th image

In late October, someone reviewed the 100,000th image on Asterank Discover, the crowdsourced asteroid discovery app. This is a significant milestone and I want to give a huge thanks to the thousands of people who've contributed.

Why this matters

Sky surveys have been collecting images of the night sky for decades in order to search for dangerous asteroids, but the resulting data is largely underscrutinized. Most images have been gathering dust for years, forgotten in archives after being scanned by computers once but never reviewed by human eyes.

Chelyabinsk, Russia, February 15, 2013

I've discussed in the past why human reviewal is an important part of this process. Our asteroid-hunting algorithms are outdated, and all results must be reviewed by humans anyway due to the prevalence of false positive. There's also a big false negative problem - I've heard some astronomers estimate that algorithms miss over 50% of asteroids in the imagery they're searching. A crowdsourced dataset will ultimately lead to better computer detection with fewer false positives.

Methodology

My approach in Asterank Discover was straightforward, with the intention of saving harder computation for later. Display a few control images, then display unknowns and images that we don't have enough data on. User history is recorded so we can decide how much we can trust the ability of individuals to actually spot asteroids.

Next step: an analysis on hundreds of potential asteroids found to compute and check their orbital solutions.

Also, everything in Asterank Discover is open source.

Partnership with Planetary Resources and NASA

Now that we have a successful prototype, Asterank Discover is folding into a larger project called Asteroid Zoo, which belongs to Planetary Resources (PR acquired Asterank in May 2013).

Last week, we announced an agreement with NASA and Zooniverse to crowdsource reviewal of high-quality images from the Catalina Sky Survey.

"Asteroid Zoo" will be a much smarter and more engaging app that uses the proven methodology of Zooniverse (they did Galaxy Zoo, Ice Hunters, and other successful crowdsourcing projects). Asterank Discover was great validation for this approach, and sets us up nicely with some preliminary data to test.

A typical sky survey image.

I'm very excited to see where this will lead. This approach will discover new asteroids and improve our model of the solar system. It also opens opportunities for interesting algorithmic challenges, and the chance for a normal person to discover an asteroid.

As a programmer, I'm particularly interested in how we'll improve algorithms to spot asteroids. I am sure that there are conventional ML techniques and even simpler image processing approaches that are not being sufficiently exploited.

Onwards!

What I learned from getting my side project acquired

I started Asterank in May 2012. Earlier that week, Planetary Resources announced its intent to mine water and valuable materials from asteroids. Like many others, I was intrigued. It was an inspiring, impossible long-term vision.

My project began as a thought experiment: how much are asteroids really worth? The media published wild estimates without scientific basis. No one took a principled approach toward cataloging asteroid content and value. So, on a weekend afternoon with nothing better to do, I wrote the first version at a cafe in downtown Mountain View.

13 months later, when Asterank was acquired by Planetary Resources, it was much more than an asteroid value calculator. It was a full astronomical toolkit that included web scrapers, a data pipeline, powerful visualizations, and the ability to discover new asteroids.

I had no idea what I was doing, but here are things I learned along the way:

Lesson 1: Bug people

Relentlessly contact people who can criticize or help materially.

The key is being patient and not coming off as desperate. Follow up every two weeks if you've had near-term contact, one month if you haven't.

Who to email

Cast a wide net. Contact anyone who can provide expertise, advice, or publicity. For me, this included:

  • my contact at Planetary Resources.
  • contacts at many other space companies or organizations.
  • scientists at research institutions.
  • scientists at NASA.
  • space bloggers.
  • the professor from my 100-person Intro to Astronomy course.
  • techies that could find my visualizations interesting.

Cold email guidelines

Emails should always be short and simple:

  • Briefly describe what I made and the success I've already had (visitors, news coverage, etc.).
  • Tell them what my goal is.
  • Ask them for something that will help me accomplish this goal.

This shouldn't be more than 2 or 3 short paragraphs. Follow-up emails should be even shorter:

  • Update on project - latest successes, features, etc. (skip this if you're following up on a promise they've already made).
  • Ask them for something.

Don't get emotional

Most of your emails won't get read. This can be insulting and stressful. Hang in there and take nothing personally.

Boomerang for Gmail is a great tool for email reminders. I used the free version and followed up once a month with people I was interested in.

Over time, people started initiating contact instead of the other way around, and my network grew. Persistent emails were the single largest contributing factor in the success of Asterank.

Lesson 2: Viral/social content is a boon and a timesink

When I launched, the only self-promotion I did was a Hacker News post, which gained a total of 2 points (I deserved this for the awful linkbait title).

Fortunately, someone picked it up and Asterank was featured on Universe Today, a popular space blog. A couple people including the Planetary Resources leadership contacted me afterwards. From then on, traffic was steady but with major spikes from social aggregators, news coverage, etc.

This brought my site down on Christmas.

If no one notices your project but it is genuinely interesting, just blog about it until they notice. I posted Asterank Discover on HN and it got 5 points. Then I wrote a blog post about it that made the front page. Go figure.

The caveat is that social traffic fades quickly and is mostly people who aren't interested in your actual product. It helped me get started, but the results were not permanent and the marginal benefit decreases quickly. Don't get caught up in it.

Lesson 3: A basic feedback form is essential

Some of my best contacts came through Asterank's About page. I provided my email address and added a contact form. I recommend having both (the form is low-friction, but some dislike the indirectness).

A basic form only takes a few minutes to add.

I also added a way to "subscribe to updates," but it actually just sent me their email address. I used this to gauge interest; there was no point in setting up a mailing list before I knew people would use it.

Regardless of how you do it, easy-to-find contact info is essential. It facilitated several job opportunities, conference invitiations, and media interviews.

Lesson 4: There might be leads in your analytics

Scan your analytics every now and then, especially referrers. I made a valuable contact just by noticing a link from company email. I reached out without referencing that I was watching the logs, but it's much easier to "cold" contact someone you know is already interested.

When I sent emails, I sometimes tracked clicks by linking http://asterank.com/?f=n, where n is a unique string. This way, even if they didn't respond, I could tell who was interested enough to click.

Lesson 5: LinkedIn can sometimes be useful

LinkedIn can be frustrating for software engineers, but it's especially important if you're tackling an industry outside tech. It provided an easy way for the Planetary Resources guys to find me.

The downside is that LinkedIn causes a lot of emails.

I know many software engineers who question the value of LinkedIn. It may cost you some sanity, but maintaining a basic, up-to-date profile was worth it.

Lesson 6: Open source everything you can

Most people are surprised when I tell them Asterank is almost entirely open source. It lends an air of transparency and invites collaborators. The project gains valuable feedback and perspective as a result.

Blogging about technical issues is a good way to get exposure in the open-source community. Techies are often interested in how specific technologies are used in certain applications. Asterank capitalized on this, with its visualizations making rounds in the webgl community. You can get the attention of smart and well-connected people by showcasing interesting technology applications.

Lesson 7: You need to stick with it

I have 5+ side projects. I'd like to make businesses out of them, but I often lose interest after a couple weeks. Asterank was the only project that I've stuck with for over a year, and it paid off even though there wasn't a clear path to monetization.

I should get out more.

It's hard to predict what will be valuable as a side project. For hobbies, working on what you're most passionate about is the best way to get a return. Otherwise you lack the discipline to follow through.

Lesson 8: Be grateful

I've had to make some hard decisions and turn down some great opportunities to wind up here. I'm really lucky to have options and support from friends, family, and coworkers.

I've accepted a Software Engineer position at Planetary Resources starting in November, and am very excited to learn a lot and see where things takes me.

Good luck!

Follow me: @iwebst

A Basic Kepler Exoplanet Visualization

Scientists have discovered over 3,000 potential "exoplanets" - planets that orbit stars outside our solar system. As part of my affinity for interactive renderings of cool space stuff, I've built a simple webgl viewer and API for exoplanet data.

Building the API

The best data source for exoplanets that I found is the NASA Exoplanet Archive. They have an API, but it is not queryable and essentially a data dump.

Because there are only 3,000 candidate exoplanets (or less, depending on your dataset), the usefulness of a full API is questionable. But there's something to be said for making scientific space-related data more open. And because Asterank makes it easy to pipeline/organize datasets such as these, it was a simple extension that required little work.

The visualization

This project was inspired by a handful of videos floating around, mostly of Jer Throp's visualization (source code). I thought it was great and the only thing it lacked was interactiveness.

Creating an interactive visualization was straightforward. I tweaked inputs to the Asterank engine, setting colors to reflect planets' equilibrium temperatures and marking the planets that support temperatures livable by humans.

Next steps

The resulting Asterank Exoplanet Visualization is interesting but not too informative. I could have added a data view similar to how I did the main site, but I opted for a cleaner, more visual experience.

The ability to add or adjust other dimensions would add a lot. For example, people may be interested in the size and characteristics of the central star.

I'd also like to create an alternate visualization that shows our sun at the center and the relative positions of all the exoplanet host stars. It would be like a basic galaxy map from the future.

The economics of exoplanets

In a similar vein to Asterank, astrophysicist Greg Laughlin posted an unusual ballpark equation that estimates the economic value of an exoplanet.

While it obviously does not measure a true dollar value we can realize in our lifetimes, perhaps this equation is a proxy for how interesting a planet is to future settlers. This hasn't made it into my simulation, but it's another thing to think about.

Visualizing Asteroid 2012 DA14's Upcoming Near Miss

2012 DA14 is a near-Earth asteroid that will pass extremely close to the Earth by astronomical standards around February 15, 2013. At its closest, 2012 DA14 will pass nearer than the moon and likely within the orbits of some geosynchronous satellites. It was first discovered and observed less than a year ago, in February 2012.

Orbital Visualization

2012 DA14 is an Apollo asteroid, a class of asteroids with orbits very similar to Earth's. This characteristic makes some Apollo asteroids dangerous to our planet.

The asteroid's semimajor axis is the same length Earth's, and it makes a trip around the sun every 366 days. To give you a sense of how similar the orbits are, below is a 2D representation in which the white circle represents 2012 DA14's orbit (via Asterank):

This graphic gives a good sense of the orbits from above, but doesn't show the difference between the plane of orbits of the asteroid and Earth. In the rendering below, the disk of Earth's orbit is gray; the yellow indicates the disk of DA14's orbit while it is above Earth's disk, and the blue indicates the orbit while it is below (courtesy of the Minor Planet Center).

Click for the Minor Planet Center's animated version (55 MB) or a close-up animated version (1.2 MB).

Interactive Visualization in Context

Asterank is an asteroid database with a 3D rendering engine that accurately displays thousands of objects in our solar system and their orbits in a realtime simulation.

In preparation for the upcoming pass, I designated 2012 DA14 as a 'significant object' in the simulation, meaning you can follow its orbit around and see the near miss of Earth. Take a look at the 2012 DA14 lock-on view (requires WebGL).

Earth (blue) and 2012 DA14 (red) passing on Asterank.

Clicking above will take you to Earth (green/blue) on November 1st 2012, where the upcoming close pass of 2012 DA14 (red) is already apparent. Around February 15th, you can see the orbits nearly intersect.

Asteroid-POV

This rendering of the pass is done in Cosmographia from the point of view of the asteroid itself:

No Potential Impact

Regardless of the uncertainty reported by media (and how close things look in the simulation), 2012 DA14 will not hit Earth in 2013. In fact, 2012 DA14 is not even classified as a Potentially Harmful Object (PHA) by the Minor Planet Center, the authority on minor planetary bodies in our solar system.

These visualizations exaggerate the scale of objects in space. Even at its closest, 2012 DA14 will require a powerful telescope to see in our night sky.

Scientists estimate the actual number of asteroids out there, accounting for undiscovered asteroids, could be more than 10 times the number currently known. In other words, our inner solar system is almost 1,000 times more crowded than depicted by the cloud of asteroids on Asterank.

Further Information and Reading

As a disclaimer, the orbit of 2012 DA14 after its near pass in February is not predictable. The calculations used in the graphics and simulations above will not be accurate after the fact due to the effects of Earth's gravity on the orbit of the asteroid. However, observation by the Minor Planet Center and other astronomers will bring this information up-to-date.

For more, please see J.L. Galache's excellent write-up on the subject on the Minor Planet Center blog, Clearing Up The FUD on 2012 DA14.

80,000 visitors on Christmas morning: a post-mortem

I woke up on Christmas morning to many emails and tweets about Asterank, a 3D space visualization. Intrigued, I checked mixpanel and was shocked: in less than 8 hours overnight, I had received over 75,000 uniques thanks to a reddit submission.

Then, to my horror, I realized the site had been down for about 3 hours.

Quick background: Asterank runs on nginx, node, and mongo on EC2 with Cloudflare for a CDN. The 3D visualization loads a static page and grabs data via AJAX.

Mongo crashes

Server logs immediately told me mongo wasn't running. I had mongo crashes like this before, caused by large, unique queries with limited RAM.

When mongo crashes due to some combination of insufficient CPU/RAM, it leaves a .lock file in /var/lib/mongo. If this file exists, mongo refuses to start and tells you to repair the database. Unfortunately, repairing wouldn't work in my crashed state, so the only solution was to manually delete the lock.

Recovering

Although mongo was up, the endpoint site was not returning results. I spent about 15 minutes trying to figure out what the issue was and frustratedly restarting node. I also purged the Cloudflare cache, which was ineffective because AJAX results are not considered static data.

Finally, I remembered that I'd configured nginx to cache things. Clearing the cache manually and restarting nginx did the trick, and the site was finally online after a little more than 3 hours of downtime.

The Aftermath

The site was back up, but the traffic and reddit post had understandably stagnated beause of the 3 hours of downtime. About 20,000 unique visitors had gone to the site and seen an error message. Although I continue to get traffic from reddit and other sources, it is nowhere near the 2+ users per second I was getting overnight.

Things I did right

nginx caching

My nginx cache setup saved multiple mongo queries per second until it was brought down by a few too many unique queries, which are generated on the main site. This was essential in dealing with a massive increase in traffic for as long as I did.

CDN

I started using Cloudflare on a whim a couple weeks ago, in conjunction with my move off a tiny 512MB linode. This was fortuitous, as it saved me a ton of bandwidth and vastly improved load times under heavy traffic.

Things I did wrong

Accurate HTTP status for AJAX endpoints

If I had programmed my AJAX endpoint to give a 500 on mongo failure, it's possible that this downtime would have been avoided altogether. Cloudflare has an always-up feature which caches pages in case things go down. Also, nginx would have cached the failed results for a much shorter period of time.

Unnecessary dependency on mongo

The 3D portion of the site could be made completely static. The interface constrains users to 3 possible queries, each for a different scoring function. I should have cron'ed mongo results to a file every 24 hours or so. This is an obvious optimization and would have protected the visualization from a mongo failure.

Monitoring

Since I knew that mongo could die like this, I had written a script to automatically recover. But I wasn't running it because I'd recently upgaded servers and the crash stopped happening.

For additional peace of mind, I could have set up an endpoint that checks the health of the mongo server. This could have been done with Pingdom or even manually with a cron + my free sms api (shameless plug).

Should I have been prepared?

In the tech community, emphasis tends to be on moving fast, iterating, getting eyeballs and feedback. People apply this advice to side projects, but in my case it would've been good to prepare more.

I can kick myself for disappointing more than 20,000 people, but Asterank is a science project that doesn't generate any revenue. Making reddit's frontpage was unthinkable until it actually happened.

Should I have showed the world right now, or I should I have spent a couple days optimizing it for a single, improbable event. Who knows.