Text Mining: Predictive Methods for Analyzing Unstructured Information
Predictive Data Mining: A Practical Guide
A consequence of the pervasive use of computers is that most documents originate in digital form. The field of text mining concerns itself with how to exploit the textual information embedded in these documents.
This book is about methods for text mining -- processing and analyzing unstructured data such as text in documents. Historically these methods have their antecedents in several research communities, many originally associated with artificial intelligence including the specialized fields of informational retrieval, knowledge discovery and data mining, and natural language. This book is a practical and introductory guide, integrating related topics and providing practical advice for text mining. The presentation of text mining emphasizes predictive methods. It does not attempt an encyclopedic survey of the entire field. Rather, it unifies several areas that have been treated separately and focuses on how readers can use these methods to analyze their own data. Readers can expect to have a good understanding of the issues involved in applying these methods to real-world problems.
The book covers all important areas of text mining: preprocessing, text categorization,information retrieval and search engines, clustering of documents and information extraction. It describes several case studies that illustrate how text mining may be applied to solving real-world problems. Emerging directions are identified that would help those looking to do research in the area. There is accompanying downloadable software that implements the methods discussed in the book so that readers may try the methods on their own data.
The book is aimed at IT professionals and managers as well as advanced
undergraduate or beginning graduate computer science students. Some background
in data mining is beneficial, but is not essential.
Contents
Overview of Text Mining; From Textual Information to Numerical Vectors; Using Text for Prediction; Information Retrieval and Text Mining; Finding Structure in a Document Collection; Looking for Information in Documents; Case Studies; Emerging Directions.
Order the book from Springer Publishers
"I enjoy reading Predictive Data Mining. It presents an excellent perspective on the theory and practice of data mining. It can help educate statisticians to build alliances between statisticians and data miners."
Emanuel Parzen, Distinguished Professor of Statistics, Texas A&M University, October 1998"Predictive Data Mining: A Practical Guide covers important technical subjects at a high level and takes the reader through a complete technical methodology...it's a great introduction."
Will Dwinnell, PC-AI, Sept/Oct 1998"Excellent introduction to the topic; thoughtful and readable introduction to data mining. It is a useful primer and refreshingly devoid of the buzzword afflictions of other books on this topic."
Posted on Amazon.com by an anonymous reader from Boston, MA, September 10, 1998."Anyone owning, building, or thinking of building a data warehouse and then going on a data mining expedition will find this book excellent preparation for the technical and intellectual challenges associated with putting big data sets to work."
Sunny Baker, Ph.D., Journal of Business Strategy, July/August 1998Reviews in Chinese of book and software
Stephen Koo in the I.T. Supplement of the Hong Kong Economic times, 19 February 1998 and 25 February 1998As storage and retrieval technology has advanced to the point where the main goals of classical databases - those of instant data recording and extremely rapid responses to queries - are well within reach, and as the amount of data stored in existing information systems has mushroomed, a new set of objectives for data management has emerged. Very large collections of data - millions or even hundreds of millions of individual records - are now being compiled into centralized data warehouses and reorganized globally by topic, allowing analysts to make use of powerful statistical and machine learning methods to examine data more comprehensively. Searches using these methods can be much more open-ended than traditional database queries, and, while consuming more time and processing resources, can be expected to return statistically valid results capable of showing trends and patterns over time and providing a platform for forecasting future developments.
Data mining is the art and science of performing these massive,
open-ended analyses,
and, most importantly, of extracting, transforming, and organizing enormous
quantities of raw data to facilitate a high-dimensional search for
predictive solutions. This book presents a
unified view of the field, drawing from statistics, machine learning, and
databases and focusing on the preparation of data and the development
of an overall problem-solving strategy. In addition, the authors review
statistical and machine learning search methods and, employing several
real-life case studies, discuss the hurdles encountered when applying these
methods to real-world data warehouses with all
of their inescapable flaws and variances. A software option for
a state-of-the-art data mining kit enables the reader to apply
the concepts presented in the book.
Anyone owning, building, or
thinking of building a data warehouse will find this book excellent
preparation for the technical and intellectual challenges associated with
putting big data to work.
Contents
What is Data Mining?; Statistical Evaluation of Big Data; Preparing the Data; Data Reduction; Looking for Solutions; What's Best for Data Reduction and Mining; Art Or Science? Case Studies in Data Mining.
Order the book from Morgan Kaufmann Publishers