I am talking specifically about the Google Photos app search, for my own personal photos.
Since around last year, I have noticed that the "quality" of the Google Photos app search bar has gone down.
The most annoying part is when I have in mind the exact photo I am trying to find, yet, no matter how hard I try, I cannot find how to word my query to retrieve it again.
I have no idea what happened to the search. I don't know anyone who works at Google. But if I had to guess, I imagine that photos are now automatically assigned "tags" from a set list (probably one of the classes returned by the Google Vision API -> ? which are maybe a superset of Open Images labels -> ?
So I decided to build my very simple own image search.
If you have machine learning knowledge, you will understand that the idea is very simple: use a contrastive vision-language model to embed my photos. At search time, use the same model to embed the query, and find the best photos using cosine-similarity.
For people who don't have ML knowledge, I want to emphasize on the fact that this idea is incredibly simple, and there is nothing novel in my approach. Google themselves released some open weights vision-language encoders which are state of the art.
I started by downloading my personal photos using Google Takeout. Using this service, you can download all your photos and their metadata (given as a json file for each photo).
To embed my photos, I used Siglip 2 Giant (16 patches/384 px, approx 1.9B parameters), for no particular reason other than 1. it is a really good model 2. i have no time constraints. With the MPS backend on my Apple M2 24 GB, I was able to embed around 35k photos in a handful of days.
I also used this project as an excuse to try to learn a tiny bit more about front-end. So I built the UI using Svelte.
In the end, I can type a query in natural language and I am pretty much guaranteed to find results, as long as the thing I'm describing exists in my photos. Siglip 2 really does all the heavy lifting here, it's an impressive model.
I'm sure that I could get the Google Photos search to return useful images if I tweak the queries more. But the thing is... I don't want to do that. I just want to describe what I have in mind, and have it appear, just like that.
Embedding the search query and searching all the images is really really fast on my Apple M2, using the MPS backend. The search is running locally, and when I hit enter, the results appear instantaneously.