Text Mining Software
The Rule Induction Kit for Text (RIKTEXT) is a complete software package for learning decision rules from document collections. Unlike complex numerical models, these rules are simple, logical combinations of words and phrases that are often highly predictive. The rules are induced automatically from data. They readily can be edited and applied to new data. RIKTEXT runs under Linux or Windows.
The Text-Miner Software Kit (TMSK) is a comprehensive software package for predictive text mining. It includes routines for preprocessing XML-based text documents and provides implementations of all the key tasks described in the book Fundamentals of Predictive Text Mining published by Springer (2010). Among these tasks: (a) data preparation including tokenization, stemming, vectorization, and dictionary compilation (b) prediction by methods such as naive Bayes and advanced linear models (c) information retrieval by k-nearest neighbors and document matching (d) document clustering and (e) information extraction of named entities. The routines run on any machine with Java.
Data Mining Software
The Rule Induction Kit (RIK) is a complete software package for discovering highly compact decision rules from data. Unlike complex numerical models, these rules are simple, logic rules that are often highly predictive. For example in a medical application, typical rules might be high blood pressure or overweight suggest increased risk of heart attack. The objective is to determine the best set of rules for prediction and classification, where best is the smallest number of rules with a near-minimum error. The rules can be manually edited if necessary and applied to new data. RIK also includes options to generate new features as linear combinations of the original features. It runs under Linux or Windows.
Enterprise Data-Miner (EDM) is a comprehensive collection of programs for efficient mining of big data. Both classical methods and more computationally expensive state-of-the-art predictive methods are included. The software kit implements the data-mining techniques presented in Predictive Data Mining: A Practical Guide published by Morgan Kaufmann (1998). EDM has programs for (a) data preparation (b) data reduction or sampling and (c) prediction. They run under Windows, Linux, Unix, and Java. A demonstration applet showcased on Sun's JavaReel Site can be seen locally.
The standard license provided for all the software kits is a single-user software license; product integration licenses are available on request (email to ).