Have you heard of Google’s SMITH algorithm?
Google’s new SMITH algorithm (Siamese Multi-depth Transformer-based Hierarchical) is quite similar to BERT (Bidirectional Encoder Representations from Transformers) in many ways – just better.
The SMITH model is trained to understand entire documents while BERT is trained to understand words within the context of sentences. Today, search algorithms (like BERT) rely on semantic matching techniques to understand the nuances and context of words. Google, for example, utilises BERT to organise Top Stories and Featured Snippets. However, BERT is far from perfect. Cue, the SMITH algorithm.
Recently, Google published a research paper on the SMITH algorithm claiming this new model outperforms BERT in terms of understanding long queries and long documents. According to Google, “to better capture sentence-level semantic relations within a document, we pre-train the model with a novel masked sentence block language modelling task in addition to the masked word language modelling task used by BERT.”
Furthermore, Google states that compared to BERT based baselines, the SMITH algorithm can increase maximum input text length from 512 to 2,048 tokens.
Google has not specified which algorithm is being used. However, it’s quite unlikely that the new algorithm will replace the old one. Instead, Google might use SMITH alongside BERT for optimal effectiveness in understanding both long and short queries and documents.