A blog post me and Vincent wrote is up on the Algolia blog here.
From a Slack discussion to PR and Merge. All in one month.
Search on the Yarn website started with the documentation. We wanted people to easily find information on how to use Yarn. As with 300 other programming community websites, we went for Algolia’s DocSearch and this was merged in yarnpkg/website#105. Then another Yarn contributor (@thejameskyle) asked in yarnpkg/website#194 if there should be package searching abilities, much like npm had.
This is where Algolia came into play. We are a search engine, Yarn wants search and we are heavy users of Yarn, so we figured: let’s do this!
This is how it started on December 5th in our #2016-community-gift Slack channel:
- “I like Yarn a lot…”
- “Let’s build Yarn search!”
In early 2017, we met with the Yarn team for a one day in-person brainstorming in London. The goal was to think about evolutions of the search experience along with defining a package details page. Algolia proposed design views of what search could be and from that we drafted a master plan.
Features behind a great package search
It shows results instantly ⚡️
Instead of showing a dropdown of results, we chose to replace the page completely with the search results. This requires more data to be available immediately, but gives more context on the decisions you make while searching for a fitting package. Having the search page be the main entry point will make sure that you don’t need to know exactly what you want before “committing” to a search.
It displays a lot of metadata
After using npm search many times, we knew what was missing and what was superfluous from the search results and package detail pages. We brainstormed a bit and iteratively added a lot of useful metadata.
Here’s a comparison between the two search results pages (npm on the left, Yarn on the right):
npm search results on the left, Yarn search results on the right (click to enlarge)
In the search results of Yarn we decided to directly display, for example, the number of downloads for every packages, the license, direct links to GitHub, and the owning organization.
This metadata helps the user to not have to open many package detail pages before getting the information they want.
It has completely rethought package detail pages
For the package detail page, we took a similar approach. We started with the same base metadata as npm shows, but also took the opportunity to add a lot more. We decided to show changelogs (when available), GitHub stargazers, commit activity, deprecation messages, dependencies and file browsing.
Here’s what it looks like:
TIL yarn has a responsive package details pagehttps://t.co/w2QkQoDP9P pic.twitter.com/wBnQ9biD85— John-David Dalton (@jdalton) March 30, 2017
This is an iterative process, and suggestions and feedback are always welcome.
Technical implementation and challenges
The code for this replication is all open source and available at algolia/npm-search. The most important API being used here is the npm replication API.
The npm registry is exposed as a CouchDB database, which has a replication protocol that can be used to either set up your own npm registry, or in our case a service (the Algolia index) that has the same data as the npm registry.
Replicating the npm registry
Replication in CouchDB is a very simple but powerful system that assigns an “update sequence” (a number) to any changes made on a database. Then, to replicate a database and stay in sync, you only need to go from the update sequence 0 (zero) to the last update sequence, while also saving the last update sequence you replicated on your end. For example, right now, the last update sequence known on the npm registry is 5647452 (more than five million changes).
Early on we saw that going from 0 to 5647452 was very slow (multiple hours) and we wanted it to be faster. So, we made a replication system consisting of three phases:
- The catch-up. Starting from our last known update sequence, we catch up to the new last update sequence of npm (maybe 5000 changes since bootstrap start, which is fast)
- The watch. Once we are “up to date” then we just watch the npm repository for any new changes and we replicate them
For all of those phases, we use the PouchDB module which we recommend because it has an awesome API to interact with CouchDB databases.
Getting the necessary metadata
All the phases go through the same steps to get the required metadata for displaying. Some of the metadata is also retrieved on the frontend directly, like the GitHub ones (stargazers, commit activity).
Here are all our sources:
- npm registry, example for express: http://registry.npmjs.org/express
- npm download counts: the npm downloads endpoint
- Packages’ dependents: the dependents endpoint of npm (there’s no official documentation on that)
- Changelogs: a clever first resolved, first served list of calls to various ChAnGeloG files, like History.md’s express changelog
- GitHub Stargazers⭐️, commit activity: we get them on the frontend directly from GitHub using the browser of the person doing a search. This way we benefit from a larger rate limit on GitHub shared amongst all users. This is also what npm search does for opened issues on their detail pages.
- Browse view: we get this from the unpkg API, which gives us the files, folders and sizes for all published packages
Query Rules to the rescue
One of them that really helped us is Query Rules. When you are searching for a package, there are two questions to answer: the package that you exactly typed, and the package that you probably typed. We found that other searches often don’t have what you typed exactly early in the results, even though it exists.
What we have as a solution is a query rule that applies when the user types the name of a package exactly or without special characters (to allow people affordance in how they type dashes).
This allows a query like
reactnative to have as first result
react-native which is very popular, and as second result
reactnative, which is deprecated and not supposed to be used, but still exactly what the user typed and may be looking for.
For a package search, we can’t make any assumptions like “Maybe the user was looking for this package instead of what they typed”. Instead we want to always present them both the exact package match if any and then the popular ones.
The future of Yarn search
- Yarn (65% of searches)
- jsDelivr, the free Open Source CDN (10% of searches) serving one billion downloads per month of our libraries
- Atom autocomplete module import
Building on that we want to add new features like:
- Bundle size information like siddharthkp/bundlesize
- Advanced filtering with tags
- Anything YOU would like to see, let us know
We did not stop at the search feature. I am proud to be a frequent contributor to the Yarn website, helping on adding translations, reviewing or merging PRs and updating the Yarn cli documentation.
This project wouldn’t have been feasible without the help of everyone from the Yarn and Algolia teams. Since our first contact with the Yarn team, communication has always been great and allowed everyone to feel confident about shipping new features to the Yarn website.
We also want to thank very much the npm team for being responsive and advising us while we were building our replication mechanism.
We hope you enjoyed this article, see you soon for this year’s community gift 🎁!
Thanks to Vincent and Ivana for their help while writing this article.