Superset NLP Engine

DeepR NLP Engine Overview

Superset makes use of a NLP (Natural Language Processing) engine written over the course of 8 years by Dan Clark. On this webpage, we describe (from a 30000 foot view) how the system works. Because the system is proprietary and has taken years to be developed, we wont delve to much in to the internals but will give a broad overview of it so that the gist of how it works can be deduced. The NLP system has been named DeepR, and we will refer to as such in this document.

Initially, text is entered in to the system (via voice or text). Once this text comes in to DeepR, it has to be parsed by a number of engines we use such as Spacy.io's engine, etc. The text is tokenized, split, and reassembled in to a format that the next stage of our system accepts. If you look at the image below, you'll see a dashed box that contains DeepR's neural networks and over a dozen algorithms that work with these neural networks. The custom neural networks are actually Recurrent Neural Networks (RNN) that have been heavily modified to work with a number of algorithms (created by Dan Clark) that are running as external processes. Data is shuttled between the algorithms and the neural networks until all the chunks of the text input are processed in to a generic language model that can be passed to the next stage. This generic language model holds a rich set of attributes, entity, and semantic data to be used by child processes in the DeepR system.

This generic language model is then passed to at the minimum, 2 other systems. One system tries to create a temporal model from what ever it is given, and the other tries to generate a final semantic representation of the data. If you're wondering about the number of steps in the system or why these models are needed, the most basic answer is because that's how the system has evolved over time to work after much trial and error. By breaking up the system in to multiple parts like this, we can host many instances of it in the cloud (Amazon AWS) and on our private machines in a hybrid virtual network. This has made the system more extensible and robust over time.

We hope this webpage gives some understanding of DeepR, given the constraints we have not to reveal our secret sauce. We welcome any questions, comments, or criticisms of this document. Please check back often as well will continually update it. Email us at: help@superset.ai