Professional Certificate course in Data Science

130,000.00 +GST

Category:

Course Details :

  • Course Type : Online Live Delivery with self paced courses
  • Duration : 84 Hours (3.5 Months for weekdays & 6 months for weekend)
  • Total Lectures : 44 Lectures
  • Skill Level : Intermediate
  • Assessments : Daily Assessments
  • Certificate : Yes

Outcome Expected :

  • Acquire a solid understanding of fundamental concepts in statistics, mathematics, and programming languages commonly used in data science (e.g., Python).
  • Develop proficiency in exploring and analyzing datasets, including data cleaning, preprocessing, and visualization.
  • Gain hands-on experience with machine learning techniques, algorithms, and tools for tasks such as classification, regression, clustering, and feature selection.
  • Learn to apply statistical models for hypothesis testing, inferential statistics, and predictive modeling.
  • Acquire skills in cleaning and preprocessing messy, real-world data to make it suitable for analysis.
  • Complete practical, hands-on projects that simulate real-world scenarios, applying data science techniques to solve business problems.
  • Develop the ability to communicate findings and insights clearly to both technical and non-technical stakeholders.

Requirements :

  • Daily Assessments
  • Mini projects (module wise)
  • Live Evaluation

Target Audience :

  • Undergraduate and postgraduate students in any domain/field

Key Features :

  • Requires no programming or technical skills. Students with no technical background can join
  • Cover various data science topics in detail such as Python, Databases, Statistics, Data Engineering, Machine Learning, and NLP
  • Placement Support is provided for all the students who pass the eligibility criteria
  • Industry support for every student

Curriculum

Module 1: Python

This module caters to beginners by acquainting them with the foundational concepts of Python programming. It covers everything from data types and loops to functions and data structures. 

  • Why python ?
  • Python IDE
  • Hello World Program
  • Variables & Names
  • String Basics
  • List 
  • Tuple
  • set
  • Dictionaries
  • Conditional Statements
  • For and While Loop
  • Built-in-Functions-(Numbers and Math)
  • User Defined Function
  • Modules and Packages
  • Common Errors in Python

Module 2: Python Advanced

Building on the Python basics, this module explores more advanced concepts, such as list comprehensions, file handling, and object-oriented programming. It also delves into other important topics like pickling and debugging in Python, offering a comprehensive understanding of advanced Python concepts:

  • List Comprehension
  • File Handling
  • Debugging in Python
  • Class and Objects
  • Lambda, Filters and Map
  • Regular Expressions
  • Python PIP
  • Read Excel Data in Python
  • Iterators, Decorators and Generators
  • Pickling
  • Python JSON

Module 3: Algorithmic Thinking with Python

This module covers key concepts in algorithm design, including problem-solving strategies, algorithm analysis, data structures, and algorithmic paradigms.

  • Introduction to algorithmic Thinking
  • Algorithm Efficiency and time complexity
  • Example algorithms – binary search, Euclid’s algorithm
  • Data structures – stack, heap, and binary trees
  • Memory Management/Technologies
  • Best Practices – Keeping it simple, dry code, naming Conventions, Comments, and docs.

Module 4: SQL Basic

In this module, we will dive into the SQL-based databases. We will learn the basics of SQL queries, schemas, and normalization.

  • Database-Introduction and Installation,
  • Data Modeling
  • Normalization and Star schema
  • ACID Transactions
  • Data Types
  • Data Definition Language (Create,Drop,Truncate,Alter)
  • Data Manipulation Language (Select,Delete,Update,Insert)
  • Data Control Language (Grant,Revoke)
  • Transaction Control language (Commit,Revoke,Rollback)
  • SQL Constraints(Primary key, Foreign Key,Unique,Not NULL, CHECK,DEFAULT)
  • Operators (Arithmetic, Logical, Bitwise, Comparison,Compound)
  • Clauses in SQL(Where,Having,Group by, Order by)

Module 5: SQL Advanced

we will Continue into the SQL-based databases. We will learn the SQL Advanced queries, Join, Date and Time Functions and SubQueries.

  • Joins(Inner,Left,Right,Full Join,Equi Join,Non-Equi Join,Self Join)
  • Mathematical functions (SQRT,PI,SQUARE,ROUND,CEILING)
  • Conversion functions(changing the data types)
  • General functions(COALESCE,NVL,NULLIF)
  • Conditional expressions (if,case) 
  • Date and time functions
  • Numeric functions
  • String Functions
  • Subqueries
  • Rank and Window Functions
  • Integrating Python with SQL

Module 6: Pandas

This module addresses the essential need for effective data handling. It introduces the Pandas library, detailing its various functions and features for efficient data manipulation and analysis:

  • Introduction to Pandas
  • Series Data Structure – Querying and Indexing
  • DataFrame Data Structure – Querying, Indexing, and loading
  • Merging data frames
  • Group by operation
  • Pivot table
  • Date/Time functionality
  • Example: Manipulating DataFrame

Module 7: Statistics & Probability with Numpy- Basic

We will go through Probability and Statistics which are essential to understanding, process and interpret the vast amount of data. We will deal with the basics of probability and statistics like Probability theory , Bayes theorem, distributions etc and their importance. Besides that we will do hands on with Numpy upon those concepts.

  • Why counting and probability theory?
  • Basics of sample and event space
  • Axioms of probability
  • Total Probability theorem and Bayes Theorem
  • Random variables, PMF and CDF
  • Discrete Distributions – Bernoulli, Binomial and Geometric
  • Expectation and its properties
  • Variance and its properties
  • Continuous Distributions – uniform, exponential and normal
  • Sampling from continuous distributions
  • Simulation techniques – simulating in NumPy

Module 8: Statistics & Probability with Numpy- Advanced

We will continue with statistics and probability and we will deal with descriptive and inferential statistics along with Hypothesis testing and lot of other relevant statistics methods

  • Inferential statistics – sample vs population
  • CLT and its proof
  • Chi-squared distribution and its properties
  • Point and Interval Estimators
  • Estimation technique – MLE
  • Interval Estimator of μ with unknown σ
  • Examples of estimators
  • Hypothesis testing – I
  • Hypothesis testing – II
  • Hypothesis testing – III

Module 9: Data Visualization using Python

Data Visualization is used to understand data in visual context so that the patterns , trends and correlations in the data can be understood. We will do a lot of visualization  with libraries like Seaborn, Matplotlib etc inturn that leads to effective storytelling.

  • Read Complex JSON files
  • Styling Tabulation
  • Distribution of Data – Histogram
  • Box Plot
  • Data Visualization – Recap
  • Pie Chart
  • Donut Chart
  • Stacked Bar Plot
  • Relative Stacked Bar Plot
  • Stacked Area Plot
  • Scatter Plots
  • Bar Plot
  • Continuous vs Continuous Plot
  • Line Plot
  • Line Plot Covid Data

Module 10: Data Visualization (Tool) PowerBI/Tableau (Add-on)

This module covers a range of topics essential for mastering Power BI/ Tableau including data preparation, data modeling, data visualization, and report creation.

  • POWERBI  
    • Introduction to PowerBI
    •  Creating, Managing and filtering Data
    • Basic Plots in PowerBI – Trend Analysis, Area,
    • Ribbon, Scatterplots and Decomposition trees
    • Creating PowerBI reports
    • Creating interactive dashboards and deploying the dashboards
  • TABLEAU 
    • Introduction to Tableau
    • Connecting, managing and aggregating data
    • Visual Analytics in Tableau
    • Simple predictive analytics using tableau
    • Building Tableau Dashboards

Module 11: Introduction to Machine Learning

This module provides participants with a solid foundation in machine learning principles, algorithms, and methodologies. 

  • What is Machine Learning?
  • Different types of Machine Learning problems (Supervised, Unsupervised, Reinforcement)
  • Applications of Machine Learning
  • The Machine Learning Pipeline

Module 12: Machine Learning: Data Collection 

This module covers a wide range of topics related to data collection, including data acquisition strategies.You will learn how to identify relevant data sources, retrieve data from various sources such as databases, APIs, and web scraping.

  • Data Sources (Structured, Unstructured)
  • Data Collection Techniques (APIs, Web Scraping, Sensors)
  • Data Acquisition Ethics

Module 13: Machine Learning: Data Cleaning & Pre-Processing

This module covers a comprehensive range of topics related to data cleaning and preprocessing, including handling missing values, dealing with outliers, standardizing and scaling numerical features, encoding categorical variables, and feature engineering. 

  • Data Cleaning Techniques (Handling Missing Values, Outliers)
  • Data Transformation (Scaling, Normalization, Encoding)
  • Feature Engineering (Feature Selection, Creation)
  • Balancing Data (Undersampling, Over Sampling and SMOTE)

Module 14: Machine Learning: Exploratory Data Analysis

This module covers a wide range of topics related to exploratory data analysis, including data visualization, summary statistics, correlation analysis, and dimensionality reduction techniques. 

  • Data Visualization Techniques (Histograms, Scatter plots, Box Plots etc )
  • Univariate, Bivariate and Multivariate Analysis
  • Understanding Data Distribution and Relationships
  • Identifying Patterns and Trends
  • Feature Importance Analysis

Module 15: Machine Learning: Model Building

Model building is a crucial stage in the machine learning workflow, where practitioners leverage algorithms to learn patterns and make predictions from data. In this module, you will learn about Supervised learning techniques.

Supervised Learning

  • Introduction to Supervised Learning
  • Linear Regression (Regression)
  • Logistic Regression (Classification)
  • Decision Tree (Regression / Classification)
  • Random Forest (Regression / Classification)
  • Support Vector Machine (Regression / Classification)
  • Naive Bayes (Regression / Classification)
  • XGBoost (Regression / Classification)
  • KNN (Regression / Classification)
  • ARIMA (Forecasting)

Module 16: Machine Learning: Model Building- Continued

We will continue learning into model building and delve into UnSupervised learning techniques & Reinforcement learning.

  • UnSupervised Learning
    • Introduction to Unsupervised Learning
    • K-Means Clustering
    • Hierarchical Clustering
    • DBSCAN
    • PCA
  • Reinforcement
    • Introduction to Reinforcement Learning

Module 17: Machine Learning: Model Evaluation & Hyper Parameter Tuning

This module covers a comprehensive range of topics related to model evaluation and hyperparameter tuning. You will learn how to assess the performance of machine learning models using various evaluation metrics.

  • Model Evaluation
    • Regression (R2, MAE, MSE, RMSE etc)
    • Classification( Accuracy, Precision, Recall,F1-Score, AUC-ROC etc)
  • Model Hyperparameter Tuning
    • Random Search
    • Grid Search
    • Bayesian Optimization
    • Cross Validation
    • Early Stopping

Module 18: Machine Learning: Model Deployment

This  module focuses on the final stage of the machine learning pipeline, where trained models are deployed into production environments to make predictions on new data. 

  • Saving and Loading Models
  • Preparing Models for Production Environments
  • Model Monitoring and Performance Tracking
  • MLFlow

Module 19: Deep Learning with Pytorch: NN & ANN

This module provides participants with a comprehensive introduction to deep learning concepts and techniques using PyTorch. We will also discuss neural networks(NN), the building blocks of deep learning, and artificial neural networks (ANNs).

  • Fundamentals of Neural Networks: Limitations of ML; The Neuron; Linear perceptron as neurons
  • Feed Forward Neural Networks: Linear Neurons and limitations; Sigmoid, Tanh and ReLU; Softmax
  • Learning-I: Gradient Descent; Delta rule and learning rates; Gradient descent with sigmoidal Neurons
  • Learning-II: Backpropagation; Stochastic and minibatch; Test set, validation set, and overfitting
  • Preventing overfitting
  • PyTorch Basics: Installation and setup of PyTorch; Tensors and operations in PyTorch
  • Training Fundamentals: Autograd; Backpropagation; Gradient Descent; Training Pipeline.
  • Regression with PyTorch: Linear Regression; Logistic Regression
  • Dataset in PyTorch: Dataset and Dataloader; Dataset Transforms.
  • Training Pipeline: Softmax and Crossentropy; Activation Functions

Module 20: Deep Learning with Pytorch: CNN

This module focuses specifically on CNNs, a specialized type of neural network designed to effectively capture spatial hierarchies and patterns present in images. 

  • Introduction to CNN Architecture
  • Image Filter/Image kernel;
  •  Convolution layer and RGB
  •  Pooling Layer

Module 21: Deep Learning with Pytorch: RNN

This module is designed to provide a deep understanding of recurrent neural networks (RNNs) and their applications using PyTorch, a popular deep learning framework.

  • Introduction to RNN Architecture
  • Language models; 
  • Generation with RNNs
  • Drawback of RNN

Module 22: Deep Learning with Pytorch: LSTM

This module provides a thorough understanding of Long Short-Term Memory(LSTM) networks, including their architecture, training algorithms, and applications. 

  • Adding more memory: LSTM architecture
  • Applications of LSTM
  • Drawback of LSTM

Module 23: Deep Learning with Pytorch: Transformers & GAN

This module explores advanced deep learning concepts focusing on Transformers and Generative Adversarial Networks (GANs) using PyTorch.

  • Introduction to Transformer Architecture
  • Self Attention Layer
  • Encoder
  • Decoder
  • Sequence to Sequence
  • Transfer Learning (Hugging Face)

Module 24: Natural Language Processing(NLP)

This module offers participants a comprehensive introduction to the field of natural language processing, focusing on techniques and applications for analyzing and understanding human language data.

  • Text Processing:
    • Tokenization
    • Normalization
    • Stop word removal
    • Stemming/Lemmatization
  • Text Vectorization and Embedding
    • Bag-of-Words (BoW)
    • TF-IDF
    • Word Embeddings
    • Sentence Embeddings

Module 25: Natural Language Processing(NLP)-Continued

In this module, We will continue into NLP techniques and focus on applications of pre-trained models using Hugging Face.

  • Applications of Pre-Trained Models (Hugging Face):
    • Text Classification: Classifying text into predefined categories (e.g., sentiment analysis, spam detection).
    • Machine Translation: Translating text from one language to another.
    • Question Answering: Extracting answers to questions from a given context.
    • Text Summarization: Condensing lengthy text into a shorter, informative summary.
    • Text Generation: Generating different creative text formats like poems, code, scripts, etc. (depending on the model).

Module 26: Computer Vision: Image Pre-Processing

This module is designed to equip participants with the essential techniques and methodologies for preparing and pre-processing images in computer vision applications.

  • Annotation: Marking important parts of the image, like objects or areas of interest.
  • Data Augmentation: Making variations of the image by doing things like flipping, rotating, or changing colors. This helps the model learn better by seeing more examples.
  • Normalization: Adjusting the brightness and contrast of the image to make it easier for the model to understand.
  • Resizing: Making sure all images are the same size so the model can process them easily.

Module 27: Computer Vision: Image Classification

This module covers Image classification, a fundamental task in computer vision, where the goal is to categorize images into predefined classes or categories based on their visual content.

  • Convolutional Neural Networks (CNNs)
  • Residual Networks (ResNets)
  • Inception Networks
  • MobileNets
  • EfficientNet

Module 28: Computer Vision: Object Detection

This module delves into the techniques and methodologies for detecting and localizing objects within images or videos, a fundamental task in computer vision applications.

  • Faster R-CNN
  • YOLO (You Only Look Once)
  • SSD (Single Shot Multibox Detector)
  • Mask R-CNN

Module 29: Computer Vision: Image Segmentation

This module is dedicated to exploring advanced techniques for partitioning images into semantically meaningful regions, known as image segmentation. 

  • Semantic Segmentation

Module 30: Cloud Computing using AWS

This module provides a comprehensive understanding of cloud computing principles and practical skills in utilizing Amazon Web Services (AWS), one of the leading cloud service providers.

  • Cloud Infrastructure
    •     Overview of AWS services: compute, storage, networking, databases.
    •     Key AWS services: EC2, S3, VPC, RDS.
  • Cloud Configurations & Services
    •     IAM for access control.
    •     CloudFormation for infrastructure as code.
    •    AWS Lambda for serverless computing.
    •    Elastic Beanstalk for application deployment.

Module 31: Cloud Computing using AWS-Continued

Through this module, you will have the skills and knowledge to effectively leverage Amazon SageMaker to build, train, and deploy machine learning & Deep learning models for a variety of use cases. We will understand the end-to-end workflow of model development in SageMaker.

  • Building & Deploying ML Model in SageMaker
    •    SageMaker for ML model building and deployment.
    •    Data preprocessing and model selection.
    •    Training, evaluation, and deployment of ML models.
  • Building & Deploying DL Model in SageMaker
    •    Deep learning concepts and architectures.
    •    SageMaker for building and training DL models.
    •    Deployment of DL models with SageMaker endpoints.

Module 32: Cloud Computing using AWS: Hosting

In this module, we will learn how to effectively deploy and host ML/DL applications on AWS infrastructure. Also, we will understand the different deployment options available on AWS and be able to select the most suitable approach based on their application requirements.

  • Hosting An ML/DL Application on AWS
    •    Integrating ML/DL models into web apps.
    •    Deployment and scaling on AWS infrastructure.
    •    Monitoring, logging, security, and compliance measures.

Module 33: Generative AI: Unleashing the Power of Language Models

Generative AI introduces learners to the cutting-edge field of generative artificial intelligence (AI), focusing on the remarkable capabilities of Large Language Models (LLMs) and their applications in various domains. The module provides a comprehensive overview of LLMs, prompt engineering techniques, and fine-tuning strategies.

  • LLM (Large Language Model)
    • Introduction to Large Language Models
    • Description of GPT-3 and chatGPT architecture
    • Application of LLMs in various fields
    • Basic description of other LLMs
    • Learn GenAI with Llama, OpenAI, Gemini, Hugging Face
  • Prompt Engineering
    • Introduction to Prompt Engineering
    • Overview of language models and their capabilities
    • Understanding Language Model Responses
    • Crafting Effective Prompts
    • Controlling Model Output
  • FineTuning LLM
    • Fine-Tuning Techniques
      • Task-specific fine-tuning vs. domain adaptation
      • Architecture modifications for task-specific fine-tuning
  • Dataset selection and curation for fine-tuning
  • Implementing fine-tuning pipelines with PyTorch
  • Hyperparameter tuning and optimization strategies.DATA

FAQs

Q. Are there any benefits with the certification ?

Ans. The certification is provided by IFACET – IIT Kanpur

Q. Will the certification help in Placements ?

Ans. Yes, 100% placement Support is provided for all the students who pass the eligibility criteria

Q. Does the certification lead to an alumni status from IITK ?

Ans. No

Instructor Profile

Name: Amit Arjun Verma

Amit has a Ph.D. from the IIT-Ropar with research interests in Natural Language Processing, Collective Intelligence, Collaborative Knowledge building, and Open-Source software development. He has worked on core NLP problems with large-scale datasets on efficient representation and extraction of the datasets of online collaborative portals. Amit has developed various open-source libraries for scientific analysis of Online Collaborative Portals.