 SumedhaIT imparts Best Data Science Training with Certified Trainers. There is huge demand for Data Science in national and international Job market. The most extensive Data Science course in the market covering complete concepts of Data Science includes Statistical Analysis, R, Python, Statistical Modelling, Machine Learning, Deep Learning, Predictive Modelling are covered extensively as part of Data Science course Training.

## Course curriculum

### 1. Introduction to Data Science

#### a. Data types

• Continuous Variables
• Ordinal variables
• Categorical Variables
• Time Series

#### d. Data distributions

• Normal distributions – Characteristics
• Binomial distributions

### 3. Introduction to R

• What is R?
• Types of Objects in R
• Creating new variables or updating variables
• IF Statements and conditional loops-For, while etc
• String manipulators
• Sub setting data from matrices and data frames
• Casting and melting data to long and wide format
• Merging datasets

### 4. Exploratory data analysis and visualization

• Getting data into R- reading from files
• Cleaning and preparing the data-converting the data types
• Handling missing values
• Visualization in R using ggplot2
• Adding more dimensions to the plot
• Visualization using Tableau (Introduction)
• Correlation-positive,negative and no correlation
• what is spurious correlation
• Correlation vs Causation

# Visualization :

### Basic Visualization :

• Bar Charts
• Histograms
• Pie Charts

• Item frequency plots
• Interactive graphs
• Automated plots

### 5. Introduction to Python

• understanding the reasons python popularity
• Basics of Python: Operations, loops,functions, dictionaries
• Advanced operations with text: Finding, Sequencing and basic analytics
• Ground-up for Deep-Learning

### 6. Statistical Modelling

#### a. Supervised learning

i. Linear Regression (Prediction)                                                          ii. Logistic Regression (Classification)

• Simple Linear Regression                                                                 Need for logistic regression
• Model development and interpretation                                         Maximum likelihood estimation
• Model validation – tests to validate assumptions                        Model development and interpretation
• Multiple linear regression                                                                 Confusion Matrix ROC curve
• Disadvantages of linear models                                                       Pros and Cons of logistic regression models

#### b. Un-Supervised learning – Cluster analysis (Segmentation)

• Hierarchical clustering
•  K-Means clustering
•  Distance measures

#### c.  Time series analysis – Forecasting

• Simple moving averages
•  Exponential smoothing
•  Time series decomposition
•  ARIMA

### 7.  Machine Learning

#### a. Decision trees

• Process of tree building
• Entropy and Gini Index
• Problem of over fitting
• Pruning a tree back
• Trees for Prediction (Linear) – example
• Trees for classification models –example
•  Advantages of tree based models?

#### b. Re-Sampling and Ensembles Methods

• Bagging – Random Forest
• Boosting – Gradient boosting machines

• Support Vector machines
•  Neural networks
•  Image processing
•  Introduction to deep learning

### 8.  Model validation and deployment

• RMSE – Root Mean squared error
•  Misclassification rate
• Area under the curve (AUC)

### 9.  Handling problem cases

• Imbalanced Classification problem.
•  High Cardinal data problem
•  Encoding cat and continuous variables
•  Overfitting and Underfitting models

•  Dplyr
•  Lubridate
•  Tidyr
•  Caret
•  Ggplot2
• Reshape2

### 11. Artificial intelligence tools in Data Science

• Introduction to H2o
• Modelling with H2o on R
• KNIME

### 12. Extra Offering from Sumedha

• Interview preparation
• Case Studies
• Resume preparation guidance
• Industry trends, companies information data

Course Duration : 90 hrs

Course Fee : 30,000/