about      technology      team      partners      blog
About our Technology
Clueray’s patent-pending IntentMatch technology was created to allow search and search-based applications to focus results to meet a specified information need. For example, an article meets a very different information need than a directory or homepage. Similarly, a form meets a very different information need than a video player. Unfortunately, current keyword-based approaches to search can’t make a distinction between these different document intents, leaving users to manually sort through mixed collections of results to identify those which meet their information need. Clueray creates an “Intent Signature” for web documents based on the visual composition and other key features in the document, allowing users and search-based applications to focus only on results which match a specific intent or meet a specified information need.

Clueray’s IntentMatch Technology
IntentMatch is built on four core processes: segmentation, feature extraction, categorization, and quality assessment.

Segmentation
The segmentation engine breaks up a document into regions of interest based on size, shape, content, and location on the page. Like many “hard” artificial intelligence problems, document segmentation is pretty easy for a human to do. However, automating this task for use in an unstructured environment like the web is particularly challenging. Clueray’s unique approach to solving this problem draws from multiple disciplines and can be tailored to address specific segmentation needs.



Feature Extraction
Feature extraction analyzes the attributes and content of the regions of interest identified by the segmentation engine and computes a multi-dimensional numerical vector which characterizes those regions. A post processing stage transforms the feature vector into the document’s Intent Signature.



Categorization
Sophisticated statistical pattern recognition and complex heuristics use the Intent Signature to determine the match between the document and a pre-defined set of document categories. A proprietary fuzzy matching strategy is used, allowing a document to be matched to multiple categories. For Clueray’s current beta, the categories have been selected to map to a taxonomy of query intent identified in the literature. Other categories or taxonomies of categories can be defined manually or through various automated techniques. The technology also allows users to create custom categories, though this functionality is currently not enabled for the current version of the beta.



Quality Assessment
For each document, a category-specific quality score is computed based on the manner in which information is presented in the document. The segmentation results, feature values and intent signature all play a role in the computation of the quality score.



An Example
An example of the all of the components at work simultaneously is described here.

IntentMatch Architecture
Clueray’s IntentMatch technology has been architected to maximize its deployment options and scalability. IntentMatch can be deployed as a web service (inside or outside the firewall), or as object code integrated with your app.