Data Scientisten (m/w) sind derzeit auf dem Jobmarkt heißbegehrt. In Amerika sind erfahrene Data Scientisten so beliebt wie eine Getränkebude in der Wüste. Aber auch in Deutschland ist eine steigende Nachfrage nach diesem Skillprofil erkennbar. Immer mehr Unternehmen bauen ´´Analytics´´-Abteilungen auf bzw. aus und suchen entsprechende Mitarbeiter. Nur: was macht eigentlich ein Data Scientist? Irgendetwas mit künstlicher Intelligenz, Machine Learning, Data-Mining, Python-Programmierung und Big Data. So genau weiß es eigentlich niemand ... Das Buch ist eine Einführung und Übersicht über das weitumfassende Themengebiet Data Science. Es werden die Datenquellen (Datenbanken, Data-Warehouse, Hadoop etc.) und die Softwareprodukte für die Datenanalyse vorgestellt (Data-Science-Plattformen, ML Bibliotheken). Die wichtigsten Verfahren des Machine Learnings werden ebenso behandelt wie beispielhafte Anwendungsfälle aus verschiedenen Branchen.
This book is a comprehensive introduction to the methods and algorithms of modern data analytics. It provides a sound mathematical basis, discusses advantages and drawbacks of different approaches, and enables the reader to design and implement data analytics solutions for real-world applications. This book has been used for more than ten years in the Data Mining course at the Technical University of Munich. Much of the content is based on the results of industrial research and development projects at Siemens.
This engaging and clearly written textbook/reference provides a must-have introduction to the rapidly emerging interdisciplinary field of data science. It focuses on the principles fundamental to becoming a good data scientist and the key skills needed to build systems for collecting, analyzing, and interpreting data. The Data Science Design Manual is a source of practical insights that highlights what really matters in analyzing data, and provides an intuitive understanding of how these core concepts can be used. The book does not emphasize any particular programming language or suite of data-analysis tools, focusing instead on high-level discussion of important design principles. This easy-to-read text ideally serves the needs of undergraduate and early graduate students embarking on an ´´Introduction to Data Science´´ course. It reveals how this discipline sits at the intersection of statistics, computer science, and machine learning, with a distinct heft and character of its own. Practitioners in these and related fields will find this book perfect for self-study as well. Additional learning tools: Contains ´´War Stories,´´ offering perspectives on how data science applies in the real world Includes ´´Homework Problems,´´ providing a wide range of exercises and projects for self-study Provides a complete set of lecture slides and online video lectures at www.data-manual.com Provides ´´Take-Home Lessons,´´ emphasizing the big-picture concepts to learn from each chapter Recommends exciting ´´Kaggle Challenges´´ from the online platform Kaggle Highlights ´´False Starts,´´ revealing the subtle reasons why certain approaches fail Offers examples taken from the data science television show ´´The Quant Shop´´ (www.quant-shop.com)
This book provides an overview of data mining methods demonstrated by software. Knowledge management involves application of human knowledge (epistemology) with the technological advances of our current society (computer systems) and big data, both in terms of collecting data and in analyzing it. We see three types of analytic tools. Descriptive analytics focus on reports of what has happened. Predictive analytics extend statistical and/or artificial intelligence to provide forecasting capability. It also includes classification modeling. Diagnostic analytics can apply analysis to sensor input to direct control systems automatically. Prescriptive analytics applies quantitative models to optimize systems, or at least to identify improved systems. Data mining includes descriptive and predictive modeling. Operations research includes all three. This book focuses on descriptive analytics. The book seeks to provide simple explanations and demonstration of some descriptive tools. This second edition provides more examples of big data impact, updates the content on visualization, clarifies some points, and expands coverage of association rules and cluster analysis. Chapter 1 gives an overview in the context of knowledge management. Chapter 2 discusses some basic software support to data visualization. Chapter 3 covers fundamentals of market basket analysis, and Chapter 4 provides demonstration of RFM modeling, a basic marketing data mining tool. Chapter 5 demonstrates association rule mining. Chapter 6 is a more in-depth coverage of cluster analysis. Chapter 7 discusses link analysis. Models are demonstrated using business related data. The style of the book is intended to be descriptive, seeking to explain how methods work, with some citations, but without deep scholarly reference. The data sets and software are all selected for widespread availability and access by any reader with computer links.
Introducing the IBM SPSS Modeler, this book guides readers through data mining processes and presents relevant statistical methods. There is a special focus on step-by-step tutorials and well-documented examples that help demystify complex mathematical algorithms and computer programs. The variety of exercises and solutions as well as an accompanying website with data sets and SPSS Modeler streams are particularly valuable. While intended for students, the simplicity of the Modeler makes the book useful for anyone wishing to learn about basic and more advanced data mining, and put this knowledge into practice.