How would you build a visual search engine? Well, visual search has been around for a while as a part of Google Images or Pinterest Lens.
However, visual search becomes more and more popular in eCommerce. And this helps businesses boost their sales. How? By allowing customers to upload what they are looking for.
So, how could one build such a visual search engine from scratch?
Build A Visual Search Engine: The Architecture Overview
Consider what we’ll need to provide the most basic form of visual search service. The service exposes HTTP API that clients can query.
Then, behind the scene, we’ll need a model and an index. The model will help us get the vector representation of our images. It can be stored in the index as well as queried later.
On the other hand, for the indexing, we could take a naive approach. We can then store the vector as is and then do a brute-force search on every query.
This is very simple but rather inefficient and slow. But, a better approach is to use an index structure that will partition the vector space. Not only that but also eliminate the need for comparing the whole index.
Yes, we need some way to change images into their vector representations. Why? That’s because these representations should have the property:
- Visually similar images should minimize the distance of their vector representations relative to some metric, and
- Vice versa, visually dissimilar images should maximize this metric.
But, how do we get into this? The answer is by using features from a hidden layer of a pre-trained image classification model.
You can fine-tune these representations on your own data. Either by using triplet losses. Or, by training a model from scratch through the use of an autoencoder setup.
The brute-force approach is not a good idea. Why? Because it will not scale to a dataset beyond a few thousand items. That’s why you may have to use an index.
There are many indices you can choose from. For example, you can use Annoy, since it has a nice Python API.
Additionally, Erik Bernhardson was the developer of Annoy. And it’s the one used at Spotify for music recommendations.
The HTTP API
API is the least exciting. However, it’s much important for our visual search service. In this case, we can use Flask to expose a search endpoint that accepts images as queries. But at the same time, returns JSON encoded results sets.
Put Them All Together
First, build the index from our images. To do that, we’ll have to load every image, we’ll get its vector representation, and then store it in the index. After adding all images, we will build the index and save it to the disk.
The next part is the search HTTP API itself. It exposes an endpoint that accepts an image as a query. It then calls the model and gets its vector representation.
After then, it queries the index which we built. Then it will be saved in the first part to get the nearest neighbors in the latent space.