fbpx

SumedhaIT imparts Best Data Science Training with Certified Trainers. There is huge demand for Data Science in national and international Job market. The most extensive Data Science course in the market covering complete concepts of Data Science includes Statistical Analysis, R, Python, Statistical Modelling, Machine Learning, Deep Learning, Predictive Modelling are covered extensively as part of Data Science course Training.

Course curriculum

1. Introduction to Data Science

2. Business Statistics

a. Data types

  • Continuous Variables
  • Ordinal variables
  • Categorical Variables
  • Time Series

b. Descriptive Statistics

c. Sampling

d. Data distributions

  • Normal distributions – Characteristics
  • Binomial distributions

e. Inferential statistics

fHypothesis testing

3. Introduction to R

  • What is R?
  • Types of Objects in R
  • Creating new variables or updating variables
  • IF Statements and conditional loops-For, while etc
  • String manipulators
  • Sub setting data from matrices and data frames
  • Casting and melting data to long and wide format
  • Merging datasets

4. Exploratory data analysis and visualization

  • Getting data into R- reading from files
  • Cleaning and preparing the data-converting the data types
  • Handling missing values
  • Visualization in R using ggplot2
  • Adding more dimensions to the plot
  • Visualization using Tableau (Introduction)
  • Correlation-positive,negative and no correlation
  • what is spurious correlation
  • Correlation vs Causation

Visualization : 

Basic Visualization :

  • Bar Charts
  • Histograms
  • Pie Charts

Advanced visualization :

  • Item frequency plots
  • Interactive graphs
  • Automated plots

5. Introduction to Python

  • understanding the reasons python popularity
  • Basics of Python: Operations, loops,functions, dictionaries
  • Advanced operations with text: Finding, Sequencing and basic analytics
  • Ground-up for Deep-Learning

6. Statistical Modelling

a. Supervised learning

i. Linear Regression (Prediction)                                                          ii. Logistic Regression (Classification)

  • Simple Linear Regression                                                                 Need for logistic regression
  • Assumptions                                                                                        Logit link function
  • Model development and interpretation                                         Maximum likelihood estimation
  • Model validation – tests to validate assumptions                        Model development and interpretation
  • Multiple linear regression                                                                 Confusion Matrix ROC curve
  • Disadvantages of linear models                                                       Pros and Cons of logistic regression models

b. Un-Supervised learning – Cluster analysis (Segmentation)

  • Hierarchical clustering
  •  K-Means clustering
  •  Distance measures

c.  Time series analysis – Forecasting

  • Simple moving averages
  •  Exponential smoothing
  •  Time series decomposition
  •  ARIMA

d. Market Basket Analysis (Association Rule Mining ) – Cross Selling

e. Text Analytics (NLP)

f.  K-NN (Nearest Neighbor)

7.  Machine Learning

a. Decision trees

  • Process of tree building
  • Entropy and Gini Index
  • Problem of over fitting
  • Pruning a tree back
  • Trees for Prediction (Linear) – example
  • Trees for classification models –example
  •  Advantages of tree based models?

b. Re-Sampling and Ensembles Methods

  • Bagging – Random Forest
  • Boosting – Gradient boosting machines

c. Advanced methods

  • Support Vector machines
  •  Neural networks
  •  Image processing
  •  Introduction to deep learning

8.  Model validation and deployment

  • RMSE – Root Mean squared error
  •  Misclassification rate
  • Area under the curve (AUC)

9.  Handling problem cases

  • Imbalanced Classification problem.
  •  High Cardinal data problem
  •  Encoding cat and continuous variables
  •  Overfitting and Underfitting models

10. Advanced packages

  •  Dplyr
  •  Lubridate
  •  Tidyr
  •  Caret
  •  Ggplot2
  • Reshape2

11. Artificial intelligence tools in Data Science

  • Introduction to H2o
  • Modelling with H2o on R
  • KNIME

12. Extra Offering from Sumedha

  • Interview preparation
  • Case Studies
  • Resume preparation guidance
  • Industry trends, companies information data

Course Duration : 90 hrs

Course Fee : 30,000/