This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics. The course covers the fundamental and advanced concepts and methods of deriving business insights from big” and/or “small” data. This training course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.
Business success in the information age is predicated on the ability of organizations to convert raw data coming from various sources into high-grade business information.
To stay competitive, organizations have started adopting new approaches to data processing and analysis. For example, data scientists are turning to Apache Spark for processing massive amounts of data using Spark’s distributed compute capability along with its built-in machine learning library, or switching from proprietary and costly solutions to the free R programming language.
TOPICS
<ul
<liAlgorithms, Techniques and Common Analytical Methods</li
</ul
Target Audience
Data Scientists, Software Developers, IT Architects, and Technical Managers
Prerequisites
Participants should have the general knowledge of statistics and programming
Candidates need to bring along their own laptops for the training
Duration: 4 Days
Course Content Summary
Chapter 1. Applied Data Science
<ul
</ul
Chapter 2. Getting Started with R
<ul
<liIntroduction</li
<liOperations</li
</ul
Chapter 3. R Statistical Computing Features
<ul
<liCorrelations</li
</ul
Chapter 4. Data Analytics Life-cycle Phases
<ul
</ul
Chapter 5. Data Science Algorithms and Analytical Methods
<ul
</ul
Chapter 6. Visualizing and Reporting Processed Results
<ul
<liJavaFX</li
</ul
Chapter 7. Text Mining
<ul
<liTF-IDF</li
<liStemming</li
</ul
Chapter 8. Introduction to Functional Programming
<ul
</ul
Chapter 9. Big Data Business Intelligence and Analytics
<ul
<liHadoop</li
<liHadoop’s Streaming MapReduce</li
</ul
Chapter 10. Introduction to Apache Spark
<ul
<liGraphX</li
</ul
Chapter 11. The Spark Shell
<ul
</ul
Chapter 12. Spark RDDs
<ul
</ul
Chapter 13. Parallel Data Processing with Spark
<ul
</ul
Chapter 14. Introduction to Spark SQL
<ul
</ul
Chapter 15. Graph Processing with GraphX
<ul
</ul
Chapter 16. The Spark Machine Learning Library
<ul
<liClustering</li
</ul
Chapter 17. Machine Learning with BigML
<ul
<liModels</li
<liPredictions</li
</ul
Lab Exercises
Lab 1. Learning the Lab Environment
Lab 2. Getting Started with R
Lab 3. Working with R
Lab 4. Data Import and Export in R
Lab 5. Creating Your Own Statistical Functions
Lab 6. Simple Linear Regression
Lab 7. Multiple Linear Regression
Lab 8. k-Nearest Neighbors Algorithm
Lab 9. Monte-Carlo Simulation (Method)
Lab 10. Using R Graphics Package
Lab 11. Using the D3 JavaScript Visualization Library
Lab 12. Common Text Mining Tasks with the tm Library
Lab 13. Elements of Functional Programming with Python
Lab 14. The Spark Shell
Lab 15. RDD Performance Improvement Techniques
Lab 16. Spark ETL and HDFS Interface
Lab 17. Common Map / Reduce Programs in Spark
Lab 18. Spark SQL
Lab 19. Getting Started with GraphX
Lab 20. PageRank with GraphX
Lab 21. Using k-means Algorithm from MLlib
Lab 22. Using Random Forests for Classification with Spark MLlib
Lab 23. Text Classification with Spark ML Pipeline