Suggester elasticsearch

11/20/2023

The de-duplication is helpful not only to please the suggester but also to generate the weight associated with your suggestion. Not providing a weight and relying on duplication to find the best matches is not in the scope of this suggester.

The completion suggester is agnostic to frequency and is solely based on the provided weight to rank the suggestions. What you describe here would better fit in the phrase suggester which works with an n-gram model and scores suggestions based on frequency and co-occurence. To me this would work like the still missing Solr "terms component" in ES. The alternative would be to keep the old suggester alive as alternative (as said before). My suggestion would be to use the new suggester's dictionary, but just allow it to be run without return document suggestions: Just find the terms/phrases from dictionary that match and return them as suggestions. I had never any problems with deletions not showing! So this was never a problem for the old suggester. If you get a suggestion which is no longer in the index as the "whole phrase", it is still likely return results (for multi-word suggestions). But for this type of suggestion this is not necessarily to happen in a timely manner. The backside: deleted stuff does not disappear. The old completion suggester had the deduplication "automatically", because it just worked on the in-memory terms dictionary, but never iterated over documents.

For users with a high frequency of coming/going documents thsi is not gonna to work. The backside: This makes adoption hard, as you have to run this in regular intervals and reindex the suggester index. To make it work as it should, user would need to create 2nd index, then execute an aggregation on the primary index on a field used for suggestions and migrate the buckets of the terms aggregation as suggestions. For this use case a Document-based suggester as the new one here is not scalable. This is how you would use a suggester like you see it on Google: just present phrases to search for which may return many documents. If you execute the search afterwards you get thousands of documents. The problem with stuff like author names is that they may appear in thousands of documents. This helps users to enter such terms (which are real suggestions, documents are no suggestions, they are already results of search). anything that could also work as a facet), then you are done already. But this makes maintenance hard! If you have structured data and you know that some fields in you documents are useful as auto-suggestion (e.g, names of authors, journal names. Generally you can do the same also with the new suggester, but you must take care of using a separate, deduplicated index and index the suggester phrases from there. The same like Google is doing - and this is what I would need (and the user who opened the issue). This suggester does not suggest documents, it just suggests terms/phrases you could enter into the search field and execute them. The old suggester was just "dictionary based" (a variant of the term dictionary that has payloads and some weights). One example of this is the search engine on Elastic's home page. If users click on those items, they are directly directed to the document. This type of suggester works fine if you index for suggestions is unique, e.g., the document title (that should be almost unique) and suggest those in the drop down.

Basically this suggester just executes a query and returns TopDocs (not really Topdocs, because it uses another scoring, but basically it is the same). The new suggester is document based, that is fine if you really want to suggest documents (and also want to filter deleted documents). Let me explain the 2 different types of Suggester "use cases": Your patch on the Lucene issue already mentions it: If you have lots of duplicates, this slows down and the idea behind a suggester is broken (as its slow, possibly horrible slow). But nevertheless a Suggester that is document based is not always the best idea. Thanks for opening the Lucene issue! It looks good to me.

0 Comments

Suggester elasticsearch

Leave a Reply.

Author

Archives

Categories