Asterank Discover, crowdsourced asteroid discovery, reviews its 100,000th image

In late October, someone reviewed the 100,000th image on Asterank Discover, the crowdsourced asteroid discovery app. This is a significant milestone and I want to give a huge thanks to the thousands of people who've contributed.

Why this matters

Sky surveys have been collecting images of the night sky for decades in order to search for dangerous asteroids, but the resulting data is largely underscrutinized. Most images have been gathering dust for years, forgotten in archives after being scanned by computers once but never reviewed by human eyes.

Chelyabinsk, Russia, February 15, 2013

I've discussed in the past why human reviewal is an important part of this process. Our asteroid-hunting algorithms are outdated, and all results must be reviewed by humans anyway due to the prevalence of false positive. There's also a big false negative problem - I've heard some astronomers estimate that algorithms miss over 50% of asteroids in the imagery they're searching. A crowdsourced dataset will ultimately lead to better computer detection with fewer false positives.

Methodology

My approach in Asterank Discover was straightforward, with the intention of saving harder computation for later. Display a few control images, then display unknowns and images that we don't have enough data on. User history is recorded so we can decide how much we can trust the ability of individuals to actually spot asteroids.

Next step: an analysis on hundreds of potential asteroids found to compute and check their orbital solutions.

Also, everything in Asterank Discover is open source.

Partnership with Planetary Resources and NASA

Now that we have a successful prototype, Asterank Discover is folding into a larger project called Asteroid Zoo, which belongs to Planetary Resources (PR acquired Asterank in May 2013).

Last week, we announced an agreement with NASA and Zooniverse to crowdsource reviewal of high-quality images from the Catalina Sky Survey.

"Asteroid Zoo" will be a much smarter and more engaging app that uses the proven methodology of Zooniverse (they did Galaxy Zoo, Ice Hunters, and other successful crowdsourcing projects). Asterank Discover was great validation for this approach, and sets us up nicely with some preliminary data to test.

A typical sky survey image.

I'm very excited to see where this will lead. This approach will discover new asteroids and improve our model of the solar system. It also opens opportunities for interesting algorithmic challenges, and the chance for a normal person to discover an asteroid.

As a programmer, I'm particularly interested in how we'll improve algorithms to spot asteroids. I am sure that there are conventional ML techniques and even simpler image processing approaches that are not being sufficiently exploited.

Onwards!

What I learned from getting my side project acquired

I started Asterank in May 2012. Earlier that week, Planetary Resources announced its intent to mine water and valuable materials from asteroids. Like many others, I was intrigued. It was an inspiring, impossible long-term vision.

My project began as a thought experiment: how much are asteroids really worth? The media published wild estimates without scientific basis. No one took a principled approach toward cataloging asteroid content and value. So, on a weekend afternoon with nothing better to do, I wrote the first version at a cafe in downtown Mountain View.

13 months later, when Asterank was acquired by Planetary Resources, it was much more than an asteroid value calculator. It was a full astronomical toolkit that included web scrapers, a data pipeline, powerful visualizations, and the ability to discover new asteroids.

I had no idea what I was doing, but here are things I learned along the way:

Lesson 1: Bug people

Relentlessly contact people who can criticize or help materially.

The key is being patient and not coming off as desperate. Follow up every two weeks if you've had near-term contact, one month if you haven't.

Who to email

Cast a wide net. Contact anyone who can provide expertise, advice, or publicity. For me, this included:

  • my contact at Planetary Resources.
  • contacts at many other space companies or organizations.
  • scientists at research institutions.
  • scientists at NASA.
  • space bloggers.
  • the professor from my 100-person Intro to Astronomy course.
  • techies that could find my visualizations interesting.

Cold email guidelines

Emails should always be short and simple:

  • Briefly describe what I made and the success I've already had (visitors, news coverage, etc.).
  • Tell them what my goal is.
  • Ask them for something that will help me accomplish this goal.

This shouldn't be more than 2 or 3 short paragraphs. Follow-up emails should be even shorter:

  • Update on project - latest successes, features, etc. (skip this if you're following up on a promise they've already made).
  • Ask them for something.

Don't get emotional

Most of your emails won't get read. This can be insulting and stressful. Hang in there and take nothing personally.

Boomerang for Gmail is a great tool for email reminders. I used the free version and followed up once a month with people I was interested in.

Over time, people started initiating contact instead of the other way around, and my network grew. Persistent emails were the single largest contributing factor in the success of Asterank.

Lesson 2: Viral/social content is a boon and a timesink

When I launched, the only self-promotion I did was a Hacker News post, which gained a total of 2 points (I deserved this for the awful linkbait title).

Fortunately, someone picked it up and Asterank was featured on Universe Today, a popular space blog. A couple people including the Planetary Resources leadership contacted me afterwards. From then on, traffic was steady but with major spikes from social aggregators, news coverage, etc.

This brought my site down on Christmas.

If no one notices your project but it is genuinely interesting, just blog about it until they notice. I posted Asterank Discover on HN and it got 5 points. Then I wrote a blog post about it that made the front page. Go figure.

The caveat is that social traffic fades quickly and is mostly people who aren't interested in your actual product. It helped me get started, but the results were not permanent and the marginal benefit decreases quickly. Don't get caught up in it.

Lesson 3: A basic feedback form is essential

Some of my best contacts came through Asterank's About page. I provided my email address and added a contact form. I recommend having both (the form is low-friction, but some dislike the indirectness).

A basic form only takes a few minutes to add.

I also added a way to "subscribe to updates," but it actually just sent me their email address. I used this to gauge interest; there was no point in setting up a mailing list before I knew people would use it.

Regardless of how you do it, easy-to-find contact info is essential. It facilitated several job opportunities, conference invitiations, and media interviews.

Lesson 4: There might be leads in your analytics

Scan your analytics every now and then, especially referrers. I made a valuable contact just by noticing a link from company email. I reached out without referencing that I was watching the logs, but it's much easier to "cold" contact someone you know is already interested.

When I sent emails, I sometimes tracked clicks by linking http://asterank.com/?f=n, where n is a unique string. This way, even if they didn't respond, I could tell who was interested enough to click.

Lesson 5: LinkedIn can sometimes be useful

LinkedIn can be frustrating for software engineers, but it's especially important if you're tackling an industry outside tech. It provided an easy way for the Planetary Resources guys to find me.

The downside is that LinkedIn causes a lot of emails.

I know many software engineers who question the value of LinkedIn. It may cost you some sanity, but maintaining a basic, up-to-date profile was worth it.

Lesson 6: Open source everything you can

Most people are surprised when I tell them Asterank is almost entirely open source. It lends an air of transparency and invites collaborators. The project gains valuable feedback and perspective as a result.

Blogging about technical issues is a good way to get exposure in the open-source community. Techies are often interested in how specific technologies are used in certain applications. Asterank capitalized on this, with its visualizations making rounds in the webgl community. You can get the attention of smart and well-connected people by showcasing interesting technology applications.

Lesson 7: You need to stick with it

I have 5+ side projects. I'd like to make businesses out of them, but I often lose interest after a couple weeks. Asterank was the only project that I've stuck with for over a year, and it paid off even though there wasn't a clear path to monetization.

I should get out more.

It's hard to predict what will be valuable as a side project. For hobbies, working on what you're most passionate about is the best way to get a return. Otherwise you lack the discipline to follow through.

Lesson 8: Be grateful

I've had to make some hard decisions and turn down some great opportunities to wind up here. I'm really lucky to have options and support from friends, family, and coworkers.

I've accepted a Software Engineer position at Planetary Resources starting in November, and am very excited to learn a lot and see where things takes me.

Good luck!

Follow me: @iwebst

How a programmer can discover an asteroid

I'm a computer scientist. I have incredible opportunities to work on fun and interesting problems.

For some reason, that wasn't always enough. I've never felt like I had the math, physics, and embedded hardware background to pursue space, one of my biggest interests since I was a kid. And like any recent grad, I've been soul-searching for what I want to do in the long term.

A year ago, I set aside my doubts and started to innovate in the space industry as a complete outsider. This post details one of these projects that was reasonably successful.

Space algorithms

Asteroid detection is a significant challenge for scientists and astronomers. Thanks to Russian dashcams, we all remember the Chelyabinsk meteor in February 2013 -- space rocks can still take us by surprise and cause significant damage.

Every night, large telescopes search the night sky for objects like the Chelyabinsk meteor, taking pictures that are reviewed by computer programs looking for moving dots in the night sky. If an image seems interesting, it is queued for review by a human operator.

Small rocks like the one that exploded over Russia are much harder to find via the automated methods used in today's sky surveys. In fact, it's estimated there are millions of undiscovered small asteroids, some of which are hidden in imagery collected over the past few decades.

A human touch

Astronomers I've spoken with tend to agree that automated approaches can miss small, dangerous asteroids. The varying quality of imagery and the prevalence of false positive streaks, smudges, hot pixels, etc. encourage conservative evaluation. My own impression is that there is not much in the way of modern, open source asteroid identification. There's also not enough public data for a machine learning approach.

My prototype solution to this problem is a web app called Asterank Discover, which aggregates upwards of half a million images of the night sky and presents them in an intelligent manner for crowdsourced review.

Users are shown short animations of the night sky that make it easy to spot asteroids and are asked to mark items of interest and flag poor quality images. The app occasionally serves control images to get a sense of whose responses are trustworthy.

As an incentive, the first user to spot an undiscovered asteroid will get naming rights (within constraints of IAU naming guidelines).

The end result? In about 2 months, over 35,000 images have been reviewed, with hundreds of potential asteroids marked. Not a small a feat for a modest science project by some software engineer.

The browser is underutilized

I've always been pessimistic about my ability to contribute to space from a pure computer science perspective, but this project was well within reach. Its success is a testament to creative thought toward applying simple web technologies to new areas.

It's no surprise that technology in non-tech sectors tends to lag significantly. This is a HUGE advantage for engineers looking for side projects (or startup ideas). For example, my work on space has made use of canvas (via the KineticJS library), webgl (via three.js), web workers, and other newer web technologies.

Software engineers, therefore, are in a unique position. Unfortunately, most still think of browsers no differently from how they did in 2005. They're unaware that taking advantage of new browser standards and other emerging tech can significantly transform outside industries.

Here are some newer browser standards I've experimented with over the course of the entire Asterank project (Asterank was about 1 year old when it was acquired by Planetary Resources last May):

SVG and Canvas

These are well known in the tech industry, but probably underused elsewhere. There are huge data visualizations opportunities out there, which Asterank capitalized on.

Web Workers

Javascript isn't just a way to manipulate the DOM in a single thread. You can do meaningful work off the main UI thread with the web workers API, which is supported by most modern browsers. Traditional approaches limit the performance of many web apps.

WebGL/WebCL and GPGPU

I've already written a bit about this, but WebGL is a great way to make stunning, performant visualizations that harness the power of graphics hardware. This unlocks a whole realm of possibility. Related technologies like WebCL, once widely adopted, will make it even easier to use the GPU on the web.

WebRTC

The standard is still under development, but WebRTC APIs are already available in two major browsers. WebRTC will unlock an entirely new class of web apps that will change video, file transfer, etc. significantly.

There's more...

I haven't mentioned web sockets, localstorage, and so on. HTML5 Rocks is a great resource for this stuff.

Eat faster

Marc Andreessen says software is eating the world, but its spread beyond Silicon Valley can be slow. Engineers should actively seek to take advantage of newer technologies that haven't percolated beyond the tech world yet. As an engineer with no funding and a hobby interest, you can innovate in deep-rooted industries. I encourage people to think about how new web technologies can solve old problems.

And by the way, Asterank and Asterank Discover are open source.

An asteroid in sky survey imagery on Asterank Discover.

A billion light years, in 3D

The universe is a big place. There are an estimated 170 billion galaxies, averaging hundreds of billions of stars each. The largest structures in the universe are giant "sheets" and "filaments" of matter, comprised by galaxies and shaped by mutual gravitational forces.

In 2005, the largest n-body computer simulation ever, dubbed the "Millenium Run," simulated only about 0.01% of the total.

Using three.js and some of the visualization techniques behind Asterank, I've created a visualization of part of the Millenium Run, spanning about 5 million galaxies and a billion light-years.

Getting the data

I prototyped with the milli-millenium database, a tiny version of the Millenium dataset made available to the general public. I requested full access after I was satisfied that I could make a decent visualization. Gerard Lemson, one of the people in charge of the Millenium Run, was very kind and made the full database available for my use.

The amount of data is huge. Due to query constraints, I wound up slowly scraping most of the database and storing it locally in flat files. I chose to scrape/visualize a cube within the larger simulation spanning a billion light years.

Performance tricks

This is by far the most GPU-intensive simulation I've done. It won't run well without an ok graphics card. And it definitely won't run on your phone.

Reducing points with spatial compression

It's not necessary to actually render millions of points for typical screen resolutions. I created a preprocessing algorithm for combining particles that overlapped visually with one another. Particles representing masses of dark matter are combined based on their size and luminosity. Using an R-tree, nearby neighbors are fitted and combined in the visualization coordinate system to reduce extra rendering.

This reduced the number of particles rendered by an order of magnitude without significantly affecting the overall appearance of the visualization. As a result, decent laptops (eg. my 11" macbook air) can run the simulation.

Adjusting simulation based on fps

The simulation also changes based on user fps. Someone whose computer and graphics card are more powerful will see a tilt shift and better rotation. Despite the small change, this enabled the simulation to run on a class of laptops that otherwise would not have had access to it.

The code uses a simple FPS counter, available on github.

Next steps

Unfortunately I haven't come up with a way to visualize the passage of time without drastically cutting down on the scope and overall accuracy of the visualization. Frankly, this is why the visualization is merely interesting instead of really amazing. It's about 40,000 data points per timestep, with about 50 timesteps. I'll be looking at ways to improve this.

Until then, it's not very useful, but it looks nice:

An analysis of my dreams over the past year

About a year ago I created KeepDream on a whim. It emails me every morning and asks what I dreamt last night.

I have no idea if dreams are special, but they are sometimes fun and I like to see trends over time. Here is an example dream I recorded on March 31st:

Given the option to turn into elephant for 2 weeks. At a dirt pen with olivia and we are about to turn into elephants. But then I start to have doubts, and I'm embarrassed to say it, but I definitely start feeling like I don't want to turn into an elephant. I realize that I value my humanity and that as an elephant I also wouldn't be able to speak or express things clearly for 2 weeks. olivia has similar doubts and we don't wind up becoming elephants.

Most of my dreams were recorded in this single-paragraph description format. Many are more serious and deal with relationships and so on.

For context, I began recording these in the spring of 2012, a year after I graduated college and started work as a software engineer.

Text analysis

I tokenized and lemmatized my dreams, then extracted n-grams and parts of speech. They are provided here for your enjoyment (all names have been anonymized):

Top unigrams

top 100 unigrams

Top bigrams

top 100 bigrams

Dream topics

Most of my dreams just involve doing things with friends. I picked out some recurring topics:

college: 25+  
high school: 15+  
work: 12  
plane crashes: 11  
train rides: 8  
in which someone dies: 7  
space/spaceships: 4  
zombies: 3  
being able to fly: 1  

More text analysis..

Top trigrams

a bunch of: 11  
with mom and: 10  
the old house: 10  
a lot of: 9  
sophia and i: 8  
some sort of: 8  
with a bunch: 7  
i realize that: 7  
hanging out with: 7  
in a big: 5  

top 100 trigrams

Top verbs

present:

going  
trying  
being  
getting  
looking  
doing  
having  
driving  
working  
running  

past:

worried  
related  
caught  
annoyed  
caused  
assigned  
inspired  

Top nouns

people  
things  
friends  
kids  
lots  
others  
hs  
guys  
guards  

Comparative adjectives

more  
bigger  
smaller  
better  
younger  
less  
older  
narrower  

My conclusion

Large-scale analysis over a text corpus of dreams does not reveal a whole lot. I also tried finding interesting collocated phrases, but the results were not good.

I'd like to build a classifier for dreams. It's clear that some dreams are lighthearted, some are stress-related, some represent fears, and so on. Categorizing these things will make future analysis easier and more interesting.

When I was in school I did research that relied heavily on semantic analysis using Freebase. Doing something similar here could be quite interesting and lend more context to my analysis.

Is it worth it?

It seems like dreams reflect my state of mind over a period of time. This is something that I notice only in retrospect, but it is interesting to go back, see, and remember these things.

For example, an abortive business venture with a friend showed up in my dreams in ways that I wasn't aware of at the time. My hopes and dreams, family/friend troubles, etc. also show up. It's a twisted autobiography.

p.s. KeepDream has a data export feature that encourages analysis like this. Eventually I may make this part of the web service, but with only about 40 users it's not a priority.