ML Docs / Chapter 1 โ€” Python & Math Basics
โ† Prev Next โ†’
๐Ÿ“– CHAPTER 01

Python & Math Basics
for Machine Learning

Machine Learning ke liye strong foundation bahut zaroori hai. Is chapter mein hum Python programming ke core concepts seekhenge aur ML ke liye zaruri mathematics โ€” Linear Algebra, Calculus, Statistics aur Probability โ€” ko clearly samjhenge.

โฑ๏ธ~8โ€“10 hrs reading
๐Ÿ’ป20+ code examples
๐Ÿงฎ15+ math formulas
๐ŸŽฏ18 quiz questions
๐Ÿ“ŠBeginner level

๐ŸŽฏ Is Chapter Ke Baad Aap Seekh Paoge

  • Python ke fundamental concepts use karna ML projects mein
  • Vectors aur Matrices ko samajhna aur NumPy se implement karna
  • Gradient Descent ke liye Calculus concepts apply karna
  • Statistical measures calculate karna real data par
  • Probability aur Bayes Theorem ML context mein use karna
  • Math formulas ko code mein translate karna

1.1 Variables & Data Types

Python mein variable ek naam hai jo memory location ko point karta hai. ML mein variables use hote hain data store karne ke liye โ€” features, labels, weights, biases sab kuch variables hi hain. Python dynamically typed language hai, matlab type automatically decide hota hai.

๐Ÿ’ก
Why This Matters for ML

Machine Learning mein hum continuously numeric data handle karte hain โ€” feature values, weight matrices, predictions. Python ke data types aur unki properties jaanna zaroori hai taaki aap efficiently kaam kar sako.

Basic Data Types

PYTHON MEMORY โ€” VARIABLE BINDING
age โ”€โ”€โ–ถ 25 int
salary โ”€โ”€โ–ถ 50000.75 float
name โ”€โ”€โ–ถ "Ahmed" str
is_ml_ready โ”€โ”€โ–ถ True bool
Python โ€” variables_and_types.py
# โ”€โ”€ Integer: whole numbers โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
age = 25
num_features = 10      # ML: kitne features hain dataset mein
batch_size = 32        # Neural network training mein use hota hai

print(type(age))           # <class 'int'>
print(type(batch_size))   # <class 'int'>


# โ”€โ”€ Float: decimal numbers โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
learning_rate = 0.001  # ML mein ye bahut important hai!
accuracy = 0.9325
loss = 2.345

print(type(learning_rate))  # <class 'float'>


# โ”€โ”€ String: text data โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
model_name = "Random Forest"
dataset_path = "/data/train.csv"

# f-string: formatted output (modern way)
print(f"Model: {model_name}, Accuracy: {accuracy:.2%}")
# Output โ†’ Model: Random Forest, Accuracy: 93.25%


# โ”€โ”€ Boolean: True / False โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
is_trained = False
use_gpu = True

# Boolean is actually int in Python!
print(True + True)    # Output โ†’ 2
print(int(True))      # Output โ†’ 1


# โ”€โ”€ Type Checking & Conversion โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
x = "100"         # ye string hai, number nahi
x_int = int(x)   # convert to int โ†’ 100
x_flt = float(x) # convert to float โ†’ 100.0
print(isinstance(x_int, int))   # True
โ–ถ OUTPUT
<class 'int'>
<class 'int'>
<class 'float'>
Model: Random Forest, Accuracy: 93.25%
2
1
True

Collections: List, Tuple, Dict, Set

ML mein collections bahut important hain. Aapke dataset ki rows, features, labels โ€” sab kuch lists ya dicts mein hote hain before converting to NumPy arrays.

Type Syntax Mutable? Ordered? Duplicates? ML Use Case
List [1, 2, 3] โœ… Yes โœ… Yes โœ… Yes Feature values, predictions
Tuple (1, 2, 3) โŒ No โœ… Yes โœ… Yes Tensor shapes (3, 224, 224)
Dict {"a": 1} โœ… Yes โœ… Yes Keys: โŒ Hyperparameters, config
Set {1, 2, 3} โœ… Yes โŒ No โŒ No Unique labels/categories
Python โ€” collections_in_ml.py
# โ”€โ”€ LIST: ordered, changeable โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
features = [1.2, 3.5, 0.8, 2.1]    # 4 input features
labels = [0, 1, 1, 0, 1]            # binary classification labels

# Indexing (0-based) โ€” ye ML mein bahut use hota hai
print(features[0])     # 1.2  (first element)
print(features[-1])    # 2.1  (last element)
print(features[1:3])   # [3.5, 0.8] (slicing)

# Useful list operations
losses = []
losses.append(2.5)    # add element
losses.append(1.8)
losses.append(1.2)
print(len(losses))      # 3
print(min(losses))      # 1.2
print(max(losses))      # 2.5
print(sum(losses))      # 5.5


# โ”€โ”€ TUPLE: fixed, immutable โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
image_shape = (224, 224, 3)   # height, width, channels (CNN input)
train_test_split = (0.8, 0.2)
h, w, c = image_shape          # unpacking
print(f"Image: {h}ร—{w}ร—{c}")    # Image: 224ร—224ร—3


# โ”€โ”€ DICT: key-value pairs โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
hyperparams = {
    "learning_rate": 0.001,
    "batch_size": 32,
    "epochs": 50,
    "optimizer": "adam",
    "dropout": 0.3
}

print(hyperparams["learning_rate"])        # 0.001
print(hyperparams.get("momentum", 0.9))    # 0.9 (default)
hyperparams["epochs"] = 100               # update value

# Iterate over dict
for param, val in hyperparams.items():
    print(f"  {param}: {val}")


# โ”€โ”€ SET: unique values โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
all_labels = [0, 1, 2, 1, 0, 2, 2]
unique_classes = set(all_labels)
print(unique_classes)           # {0, 1, 2}
print(len(unique_classes))      # 3 classes
โœ…
Pro Tip โ€” List Comprehension

ML mein data transform karne ke liye list comprehension bahut powerful hai. [x**2 for x in features if x > 1] โ€” ek line mein loop + filter + transform!

Python โ€” list_comprehension.py
data = [2.5, -1.0, 3.3, -0.5, 4.1]

# Traditional loop
squared = []
for x in data:
    squared.append(x ** 2)

# Same thing โ€” list comprehension (preferred in ML)
squared = [x ** 2 for x in data]
print(squared)   # [6.25, 1.0, 10.89, 0.25, 16.81]

# With condition: only positive values
positive = [x for x in data if x > 0]
print(positive)  # [2.5, 3.3, 4.1]

# Apply ReLU activation (replace negatives with 0)
relu = [max(0, x) for x in data]
print(relu)      # [2.5, 0, 3.3, 0, 4.1]

# Dict comprehension โ€” normalize features
feat_names = ["age", "income", "score"]
feat_vals  = [25, 50000, 0.85]
feat_dict  = {k: v for k, v in zip(feat_names, feat_vals)}
print(feat_dict)
# {'age': 25, 'income': 50000, 'score': 0.85}

1.2 Control Flow

Control flow programs ko decisions lene aur repeat karne ki power deta hai. ML mein training loops, condition checks, early stopping โ€” sab control flow se hota hai.

If / Elif / Else

Python โ€” conditionals.py
val_loss = 0.34
patience = 5
no_improve_count = 3

# Simple condition
if val_loss < 0.3:
    print("โœ… Model is good!")
elif val_loss < 0.5:
    print("โš ๏ธ  Acceptable โ€” needs tuning")
else:
    print("โŒ Loss too high โ€” check model")

# Multiple conditions with 'and' / 'or'
if no_improve_count >= patience and val_loss > 0.5:
    print("Early stopping triggered!")

# Ternary (one-liner if-else)
status = "Overfit" if val_loss > 0.6 else "OK"
print(f"Status: {status}")   # Status: OK

# Membership check with 'in'
optimizers = ["adam", "sgd", "rmsprop"]
chosen = "adam"
if chosen in optimizers:
    print(f"{chosen} is a valid optimizer")

Loops โ€” for & while

Python โ€” loops_in_ml.py
# โ”€โ”€ FOR loop โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# Training loop simulation
epochs = 5
loss = 2.0

for epoch in range(1, epochs + 1):
    loss *= 0.7    # simulate decreasing loss
    print(f"Epoch {epoch}/{epochs} โ€” Loss: {loss:.4f}")

# enumerate: index bhi chahiye
class_names = ["cat", "dog", "bird"]
for idx, name in enumerate(class_names):
    print(f"  Class {idx}: {name}")

# zip: multiple lists parallel iterate karna
y_true = [1, 0, 1, 1]
y_pred = [1, 1, 1, 0]
correct = 0
for true, pred in zip(y_true, y_pred):
    if true == pred:
        correct += 1
print(f"Accuracy: {correct/len(y_true):.0%}")  # 75%

# โ”€โ”€ WHILE loop โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
threshold = 0.1
loss = 1.0
step = 0
while loss > threshold:
    loss -= 0.15
    step += 1
    print(f"Step {step}: loss = {loss:.2f}")
    if step > 20: break   # safety break
โ–ถ OUTPUT
Epoch 1/5 โ€” Loss: 1.4000
Epoch 2/5 โ€” Loss: 0.9800
Epoch 3/5 โ€” Loss: 0.6860
Epoch 4/5 โ€” Loss: 0.4802
Epoch 5/5 โ€” Loss: 0.3361
  Class 0: cat
  Class 1: dog
  Class 2: bird
Accuracy: 75%

1.3 Functions

Functions reusable code blocks hain. ML mein har algorithm, activation function, loss function โ€” sab functions ke roop mein implement hote hain. Clean functions likhna ML engineer ki sabse important skill hai.

Python โ€” functions_in_ml.py
# โ”€โ”€ Basic function โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
def calculate_accuracy(y_true, y_pred):
    """
    Binary classification accuracy calculate karta hai.

    Args:
        y_true (list): actual labels  [0, 1, 1, 0]
        y_pred (list): predicted labels [0, 1, 0, 0]

    Returns:
        float: accuracy between 0.0 and 1.0
    """
    correct = sum(1 for t, p in zip(y_true, y_pred) if t == p)
    return correct / len(y_true)

acc = calculate_accuracy([1,0,1,1], [1,1,1,0])
print(f"Accuracy: {acc:.2%}")   # 50.00%


# โ”€โ”€ Default arguments โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
def sigmoid(z, scale=1.0):
    """Sigmoid activation: converts any number to (0, 1)"""
    import math
    return 1 / (1 + math.exp(-scale * z))

print(sigmoid(0))        # 0.5  (exactly middle)
print(sigmoid(2))        # 0.88
print(sigmoid(-2))       # 0.12
print(sigmoid(2, scale=2.0))  # steeper curve


# โ”€โ”€ *args and **kwargs โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
def build_model(*layer_sizes, **config):
    """Model architecture define karna"""
    print("Layers:", layer_sizes)   # tuple
    print("Config:", config)       # dict

build_model(784, 256, 128, 10,
            activation="relu",
            dropout=0.3)
# Layers: (784, 256, 128, 10)
# Config: {'activation': 'relu', 'dropout': 0.3}


# โ”€โ”€ Lambda functions (anonymous) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
relu = lambda x: max(0, x)
normalize = lambda x, mn, mx: (x - mn) / (mx - mn)

print(relu(-3))             # 0
print(relu(5))              # 5
print(normalize(75, 0, 100))  # 0.75
โš ๏ธ
Common Mistake โ€” Mutable Default Arguments

Kabhi bhi def func(data=[]) mat likho! Default mutable argument ek baar create hoti hai aur share hoti hai sab calls mein. Hamesha def func(data=None) use karo aur andar check karo.

1.4 Key Data Structures for ML

ML mein Python ke built-in structures ke saath NumPy arrays bhi use hote hain. Ye section dono cover karta hai.

Python โ€” numpy_basics.py
import numpy as np

# โ”€โ”€ Array creation โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
a = np.array([1, 2, 3, 4, 5])      # 1D array (vector)
b = np.array([[1,2,3],[4,5,6]])   # 2D array (matrix)

print(a.shape)   # (5,)    โ€” 5 elements
print(b.shape)   # (2, 3)  โ€” 2 rows, 3 cols
print(b.dtype)   # int64
print(b.ndim)    # 2 (2-dimensional)

# โ”€โ”€ Special arrays โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
np.zeros((3, 4))       # 3ร—4 matrix of zeros (init weights)
np.ones((2, 3))        # 2ร—3 matrix of ones
np.eye(3)              # 3ร—3 identity matrix
np.random.randn(3,3)  # random normal (weight init)
np.arange(0, 10, 2)   # [0, 2, 4, 6, 8]
np.linspace(0,1,5)    # [0, 0.25, 0.5, 0.75, 1.0]

# โ”€โ”€ Operations (vectorized โ€” NO loops needed!) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
x = np.array([1.0, 2.0, 3.0, 4.0])
print(x * 2)          # [2. 4. 6. 8.]
print(x ** 2)         # [1. 4. 9. 16.]
print(np.sqrt(x))     # [1. 1.41 1.73 2.]
print(x.mean())       # 2.5
print(x.std())        # 1.118

# โ”€โ”€ Broadcasting: auto-expand dimensions โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
A = np.array([[1,2,3],[4,5,6]])  # (2, 3)
b = np.array([10, 20, 30])        # (3,)
print(A + b)
# [[11 22 33]   โ† 1+10, 2+20, 3+30
#  [14 25 36]]  โ† 4+10, 5+20, 6+30

1.5 Object-Oriented Programming (OOP)

Pytorch aur Scikit-learn dono OOP use karte hain. Jab aap model.fit() ya nn.Module likhte ho โ€” ye sab classes hain. OOP samajhna ML libraries use karne ke liye zaroori hai.

Python โ€” oop_for_ml.py
class LinearRegression:
    """
    Simple Linear Regression ko OOP se implement karna.
    y = w * x + b
    """

    def __init__(self, learning_rate=0.01):
        self.lr = learning_rate
        self.w = 0.0    # weight (slope)
        self.b = 0.0    # bias (intercept)
        self.loss_history = []

    def predict(self, x):
        return self.w * x + self.b

    def train(self, X, y, epochs=100):
        n = len(X)
        for epoch in range(epochs):
            # Forward pass
            y_hat = [self.predict(xi) for xi in X]

            # Compute MSE loss
            loss = sum((p-t)**2 for p,t in zip(y_hat, y)) / n
            self.loss_history.append(loss)

            # Gradients (partial derivatives)
            dw = (2/n) * sum((p-t)*xi
                           for p,t,xi in zip(y_hat,y,X))
            db = (2/n) * sum(p-t
                           for p,t in zip(y_hat,y))

            # Update weights
            self.w -= self.lr * dw
            self.b -= self.lr * db

        return self

    def __repr__(self):
        return f"LinearRegression(w={self.w:.3f}, b={self.b:.3f})"


# Usage
model = LinearRegression(learning_rate=0.1)
X_train = [1, 2, 3, 4, 5]
y_train = [2, 4, 6, 8, 10]    # y = 2x
model.train(X_train, y_train, epochs=200)
print(model)                    # LinearRegression(w=2.000, b=0.000)
print(model.predict(6))         # 12.0 โ† correct!
๐Ÿง 
OOP in ML Libraries

Jab aap sklearn mein RandomForestClassifier(n_estimators=100) likhte ho โ€” ye bhi aisa hi class hai! fit(), predict(), score() โ€” sab methods hain. Ab samajh aaya? ๐Ÿ˜Š

๐Ÿ“ Mathematics Section

2.1 Linear Algebra โ€” Vectors & Matrices

Linear Algebra ML ka backbone hai. Neural networks mein forward pass, weight updates, PCA, SVD โ€” sab matrix operations par depend karte hain. Samajhna zaroor hai.

๐Ÿ“
Intuition First

Vectors ko arrows in space socho. Ek vector ki direction aur magnitude hoti hai. Matrix ek transformation hai โ€” vo vectors ko rotate, scale, project karta hai.

Vectors

Ek vector numbers ki ordered list hai. ML mein har data point ek vector hai. Ek image ka pixel vector, ek customer ka feature vector โ€” sab vectors.

VECTOR VISUALIZATION
xโƒ— [
1.2
3.5
0.8
2.1
] โ† shape: (4,) โ€” 4-dimensional vector
Think of it as: age=1.2, income=3.5, credit=0.8, debt=2.1
FORMULA โ€” VECTOR OPERATIONS
$$\vec{a} = \begin{bmatrix} a_1 \\ a_2 \\ a_3 \end{bmatrix}, \quad \vec{b} = \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix}$$ $$\text{Addition: } \vec{a} + \vec{b} = \begin{bmatrix} a_1+b_1 \\ a_2+b_2 \\ a_3+b_3 \end{bmatrix}$$ $$\text{Dot Product: } \vec{a} \cdot \vec{b} = \sum_{i=1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + a_3 b_3$$ $$\text{Magnitude: } \|\vec{a}\| = \sqrt{a_1^2 + a_2^2 + a_3^2}$$
๐ŸŽฏ
Dot Product ka ML mein Matlab

Dot product do vectors ki similarity measure karta hai. Jab do vectors same direction mein point karte hain toh dot product maximum hota hai, opposite mein negative. Neural network ka weighted sum w ยท x + b yahi hai!

Python โ€” vectors_with_numpy.py
import numpy as np

# Define vectors
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Vector addition
print(a + b)                  # [5 7 9]

# Scalar multiplication
print(3 * a)                  # [3 6 9]

# Dot product โ€” 3 ways to do it in NumPy
print(np.dot(a, b))           # 32  (1ร—4 + 2ร—5 + 3ร—6)
print(a @ b)                  # 32  (modern syntax)
print(sum(a * b))             # 32  (element-wise then sum)

# Magnitude (L2 norm)
magnitude_a = np.linalg.norm(a)
print(f"||a|| = {magnitude_a:.4f}")  # 3.7417

# Unit vector (normalized)
a_unit = a / np.linalg.norm(a)
print(a_unit)              # [0.267 0.534 0.802]
print(np.linalg.norm(a_unit))  # 1.0 (unit vector)

# Cosine similarity โ€” vector similarity measure
cos_sim = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
print(f"Cosine Similarity: {cos_sim:.4f}")  # 0.9746
# โ†‘ Very similar! (close to 1.0 means very similar)

Matrices

Matrix ek 2D array hai โ€” rows aur columns. ML mein dataset itself ek matrix hoti hai โ€” n samples ร— m features. Neural network weights bhi matrices hain.

FORMULA โ€” MATRIX MULTIPLICATION
$$A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}, \quad B = \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix}$$ $$C = A \cdot B, \quad \text{where } C_{ij} = \sum_{k} A_{ik} \cdot B_{kj}$$ $$\text{Shape rule: } (m \times n) \cdot (n \times p) = (m \times p)$$
โš ๏ธ
Shape Mismatch โ€” Most Common Error!

Matrix multiplication ke liye inner dimensions match karni padti hai. (3ร—4) @ (4ร—2) = (3ร—2) โœ… lekin (3ร—4) @ (3ร—2) โŒ Error! Neural network mein ye error bahut aata hai.

Python โ€” matrix_operations.py
import numpy as np

# โ”€โ”€ Matrix creation โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
W = np.array([[1,2,3],   # Weight matrix (3ร—3)
               [4,5,6],
               [7,8,9]])
x = np.array([1, 0, 1])   # input vector

# Matrix ร— vector (neural network forward pass!)
z = W @ x                # [4, 10, 16]
print("W @ x =", z)

# Matrix multiplication
A = np.random.randn(3, 4)   # (3ร—4)
B = np.random.randn(4, 2)   # (4ร—2)
C = A @ B                   # (3ร—2) โ† result
print(C.shape)               # (3, 2)

# Transpose
print(W.T.shape)             # (3, 3) โ€” transpose
print(A.T.shape)             # (4, 3) โ€” transpose

# Inverse (only for square matrices)
M = np.array([[2,1],[1,1]])
M_inv = np.linalg.inv(M)
print(M @ M_inv)            # Identity matrix (approx)

# Eigenvalues & Eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(W)
print("Eigenvalues:", eigenvalues)
# Used in PCA for dimensionality reduction!

2.2 Calculus โ€” Derivatives & Gradient Descent

Calculus ML ko learn karne ki power deta hai. Gradient Descent โ€” jo ML ka core optimization algorithm hai โ€” pure calculus par based hai. Derivative samjhe bina backpropagation samajhna impossible hai.

Derivative โ€” Intuition

Derivative batati hai ki function ki value kisi point par kitni tezi se badal rahi hai. Simple terms mein: agar function ek pahadi hai, toh derivative batati hai ki aap kahaan khade ho wahan pahadi kitni steep hai aur kis direction mein.

FORMULA โ€” DERIVATIVE DEFINITION & RULES
$$f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} \quad \text{(rate of change)}$$ $$\frac{d}{dx}[x^n] = nx^{n-1} \qquad \frac{d}{dx}[e^x] = e^x \qquad \frac{d}{dx}[\ln x] = \frac{1}{x}$$ $$\text{Chain Rule: } \frac{d}{dx}[f(g(x))] = f'(g(x)) \cdot g'(x)$$ $$\text{Example: } \frac{d}{dx}[\sigma(wx+b)] = \sigma'(wx+b) \cdot w$$

Gradient Descent

Gradient Descent ek iterative algorithm hai jo loss function ko minimize karta hai. Imagine karo aap andheri raat mein pahadi par ho aur neeche utarna hai โ€” aap har step mein dekhte ho ki which direction is steepest downhill, aur wahan foot rakhte ho. Yahi gradient descent karta hai!

FORMULA โ€” GRADIENT DESCENT UPDATE RULE
$$\theta_{new} = \theta_{old} - \alpha \cdot \nabla_\theta J(\theta)$$
Where:
  $\theta$ = parameter (weight),   $\alpha$ = learning rate,
  $\nabla_\theta J$ = gradient of loss w.r.t. parameter
$$\text{MSE Loss: } J(w, b) = \frac{1}{n}\sum_{i=1}^{n}(y_i - (wx_i + b))^2$$ $$\frac{\partial J}{\partial w} = \frac{2}{n}\sum_{i=1}^{n}(wx_i+b-y_i) \cdot x_i$$
01

Initialize parameters

Weight w aur bias b ko zero ya random se start karo.

02

Compute loss

Current parameters se prediction karo aur actual values se compare karo โ€” loss calculate karo.

03

Compute gradients

Loss function ka derivative lekar pata karo ki parameters kis direction mein move karein.

04

Update parameters

w = w - lr ร— gradient โ€” parameters ko gradient ke opposite direction mein update karo.

05

Repeat until convergence

Ye process tab tak repeat karo jab tak loss minimize na ho jaye.

Python โ€” gradient_descent.py
import numpy as np

# โ”€โ”€ Simple Gradient Descent from scratch โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# Problem: y = 2x + 3  โ†’  find w and b

np.random.seed(42)
X = np.array([1, 2, 3, 4, 5], dtype=float)
y = 2 * X + 3 + np.random.randn(5) * 0.1  # small noise

# Initialize
w, b = 0.0, 0.0
lr = 0.01     # learning rate
n = len(X)

print(f"{'Epoch':>6} {'Loss':>10} {'w':>8} {'b':>8}")
print("-" * 36)

for epoch in range(200):
    # โ‘  Forward pass: predict
    y_hat = w * X + b

    # โ‘ก Compute MSE loss
    loss = np.mean((y_hat - y) ** 2)

    # โ‘ข Compute gradients
    dw = (2/n) * np.sum((y_hat - y) * X)
    db = (2/n) * np.sum(y_hat - y)

    # โ‘ฃ Update parameters
    w -= lr * dw
    b -= lr * db

    if epoch % 40 == 0:
        print(f"{epoch:>6} {loss:>10.4f} {w:>8.4f} {b:>8.4f}")

print(f"\nFinal: wโ‰ˆ{w:.2f}, bโ‰ˆ{b:.2f}")
# Final: wโ‰ˆ2.00, bโ‰ˆ3.00 โ† correctly found!
โ–ถ OUTPUT
 Epoch       Loss        w        b
------------------------------------
     0    41.8333   0.4400   0.1000
    40     0.2193   1.6421   1.8903
    80     0.0341   1.9012   2.7465
   120     0.0141   1.9756   2.9301
   160     0.0107   1.9932   2.9784
Final: wโ‰ˆ2.00, bโ‰ˆ3.00

2.3 Statistics for ML

Statistics se hum data ko describe aur understand karte hain. Feature engineering, outlier detection, model evaluation โ€” sab statistics ke bina impossible hai.

Descriptive Statistics

FORMULA โ€” CORE STATISTICAL MEASURES
$$\text{Mean: } \mu = \bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$$ $$\text{Variance: } \sigma^2 = \frac{1}{n}\sum_{i=1}^{n}(x_i - \mu)^2$$ $$\text{Std Dev: } \sigma = \sqrt{\sigma^2} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(x_i - \mu)^2}$$ $$\text{Covariance: } \text{Cov}(X,Y) = \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})$$ $$\text{Correlation: } r = \frac{\text{Cov}(X,Y)}{\sigma_X \cdot \sigma_Y}, \quad r \in [-1, +1]$$
๐Ÿ“Š

Mean ($\mu$)

Average value. ML mein feature normalization ke liye use hota hai. Outliers se affect hota hai.

๐Ÿ“

Median

Middle value. Outlier-robust alternative to mean. Skewed data mein zyada useful.

๐Ÿ“

Std Dev ($\sigma$)

Average distance from mean. Feature scaling (StandardScaler) mein use hota hai.

๐Ÿ”—

Correlation

Features ke beech linear relationship. High correlation = redundant features!

Python โ€” statistics_in_ml.py
import numpy as np

ages = np.array([22, 25, 27, 28, 29,
                  30, 32, 35, 38, 65])  # 65 is an outlier!

# โ”€โ”€ Central Tendency โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
print(f"Mean:   {np.mean(ages):.1f}")    # 33.1 (pulled by 65)
print(f"Median: {np.median(ages):.1f}")  # 29.5 (robust!)

# โ”€โ”€ Spread โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
print(f"Std Dev:  {np.std(ages):.2f}")   # 12.23
print(f"Variance: {np.var(ages):.2f}")   # 149.69
print(f"Range:    {ages.max()-ages.min()}")  # 43

# โ”€โ”€ Percentiles & IQR โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
q1 = np.percentile(ages, 25)
q3 = np.percentile(ages, 75)
iqr = q3 - q1
print(f"Q1={q1}, Q3={q3}, IQR={iqr}")

# Outlier detection using IQR method
lower = q1 - 1.5 * iqr
upper = q3 + 1.5 * iqr
outliers = ages[(ages < lower) | (ages > upper)]
print(f"Outliers: {outliers}")       # [65]

# โ”€โ”€ Standardization (Z-score normalization) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# This is what StandardScaler does internally!
ages_scaled = (ages - np.mean(ages)) / np.std(ages)
print("Z-scores:", np.round(ages_scaled, 2))
print(f"New mean: {ages_scaled.mean():.6f}")  # โ‰ˆ 0.0
print(f"New std:  {ages_scaled.std():.6f}")   # โ‰ˆ 1.0

# โ”€โ”€ Correlation โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
study_hours = np.array([2, 4, 6, 8, 10])
exam_score  = np.array([50, 60, 70, 80, 90])
corr = np.corrcoef(study_hours, exam_score)[0,1]
print(f"Correlation: {corr:.4f}")  # 1.0 (perfect linear!)

2.4 Probability

Probability ML ka language hai. Classification models probabilities output karte hain. Naive Bayes, Logistic Regression, Bayesian Networks โ€” sab probability theory par based hain.

FORMULA โ€” CORE PROBABILITY RULES
$$P(A) \in [0, 1], \qquad P(\Omega) = 1, \qquad P(A^c) = 1 - P(A)$$ $$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$ $$\text{Conditional: } P(A|B) = \frac{P(A \cap B)}{P(B)}$$ $$\text{Bayes Theorem: } P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$$
๐Ÿง 
Bayes Theorem โ€” ML Connection

Bayes Theorem ka ML mein use: given observed data (evidence), hum update karte hain apni belief (prior) ko taaki posterior mile. Spam classifier exactly yahi karta hai โ€” given words, probability of spam update karna.

FORMULA โ€” COMMON PROBABILITY DISTRIBUTIONS
$$\text{Normal (Gaussian): } f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$ $$\text{Bernoulli: } P(X=k) = p^k(1-p)^{1-k}, \quad k \in \{0,1\}$$ $$\text{Binomial: } P(X=k) = \binom{n}{k} p^k(1-p)^{n-k}$$
Python โ€” probability_in_ml.py
import numpy as np

# โ”€โ”€ Basic Probability โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
outcomes = ["spam", "ham", "spam", "ham", "ham",
             "spam", "ham", "ham", "spam", "ham"]

p_spam = outcomes.count("spam") / len(outcomes)
p_ham  = 1 - p_spam
print(f"P(spam) = {p_spam:.1f}")   # 0.4
print(f"P(ham)  = {p_ham:.1f}")    # 0.6

# โ”€โ”€ Bayes Theorem โ€” Spam Classifier โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# P(spam | "free") = P("free" | spam) ร— P(spam) / P("free")

p_spam         = 0.40    # prior: 40% emails are spam
p_free_spam    = 0.80    # P("free" in email | it's spam)
p_free_ham     = 0.10    # P("free" in email | it's ham)

# P("free") = total probability
p_free = (p_free_spam * p_spam) + (p_free_ham * (1 - p_spam))

# Bayes update
p_spam_free = (p_free_spam * p_spam) / p_free
print(f"P(spam | 'free') = {p_spam_free:.2%}")
# P(spam | 'free') = 84.21% โ† word "free" makes it likely spam!


# โ”€โ”€ Normal Distribution โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
data = np.random.normal(loc=170, scale=10, size=1000)
# Human heights: mean=170cm, std=10cm
print(f"Mean height:  {data.mean():.1f} cm")
print(f"Std Dev:      {data.std():.1f} cm")

# 68-95-99.7 rule
mu, sigma = data.mean(), data.std()
within_1s = np.sum((data >= mu-sigma) & (data <= mu+sigma)) / 1000
print(f"Within 1ฯƒ: {within_1s:.1%}")   # โ‰ˆ 68%

within_2s = np.sum((data >= mu-2*sigma) & (data <= mu+2*sigma)) / 1000
print(f"Within 2ฯƒ: {within_2s:.1%}")   # โ‰ˆ 95%
โ–ถ OUTPUT
P(spam) = 0.4
P(ham)  = 0.6
P(spam | 'free') = 84.21%
Mean height:  170.3 cm
Std Dev:      10.1 cm
Within 1ฯƒ: 68.3%
Within 2ฯƒ: 95.4%
๐ŸŽฏ Practice Quizzes
๐Ÿ

Quiz 1 โ€” Python Fundamentals

0 / 5 correct
1
MCQ
Kya output aayega? print(type(True + True))
โœ… Correct! B โ€” <class 'int'>
Python mein bool actually int ka subclass hai. True = 1 aur False = 0. Isliye True + True = 2 aur type int ban jaata hai. ML mein ye jaanna zaroori hai kyunki kabhi kabhi labels boolean hote hain.
2
MCQ
features = [1.2, 3.5, 0.8, 2.1] โ€” features[-1] kya return karta hai?
โœ… Correct! C โ€” 2.1
Python mein negative indexing hoti hai. -1 matlab last element, -2 matlab second last, etc. ML mein ye bahut use hota hai โ€” jaise loss_history[-1] se last epoch ka loss milta hai.
3
Fill in Blank
Complete karo: ReLU activation function โ€” [__________ for x in data]
Code: [ for x in data]
โœ… Answer: max(0, x)
ReLU (Rectified Linear Unit) activation function: negative values ko 0 kar do, positive values waise hi rakho. [max(0, x) for x in data] โ€” ye ek line ka vectorized ReLU hai. Deep learning mein sabse common activation function hai.
4
Theory
List aur Tuple mein kya fundamental difference hai, aur ML mein Tuple kab use karte hain?
Answer:

List mutable hoti hai โ€” elements add, remove, change kar sakte ho. Shape: [1, 2, 3]
Tuple immutable hoti hai โ€” ek baar bana do, phir change nahi kar sakte. Shape: (1, 2, 3)

ML mein Tuple kab use karte hain:
โ€ข Tensor shapes define karne ke liye: input_shape = (224, 224, 3)
โ€ข Multiple values return karne ke liye: return X_train, X_test, y_train, y_test
โ€ข Dictionary keys ke roop mein (kyunki hashable hai)
โ€ข Wo values jo change nahi honi chahiye โ€” image dimensions, model architecture

Rule of thumb: "Will this data change?" โ†’ Yes = List, No = Tuple.
5
MCQ
NumPy mein a = np.array([[1,2],[3,4]]) โ€” a.shape kya hai?
โœ… Correct! B โ€” (2, 2)
.shape tuple return karta hai. 2D array ka shape (rows, columns) hota hai. [[1,2],[3,4]] โ€” 2 rows aur 2 columns โ†’ shape (2, 2). Neural network weights ka shape samajhna bahut zaroori hai!
Python Quiz Score: 0/5
๐Ÿงฎ

Quiz 2 โ€” Math for ML

0 / 5 correct
1
MCQ
Vectors $\vec{a} = [1, 2, 3]$ aur $\vec{b} = [4, 5, 6]$ ka dot product kya hai?
โœ… Correct! C โ€” 32
Dot product: $\vec{a} \cdot \vec{b} = (1ร—4) + (2ร—5) + (3ร—6) = 4 + 10 + 18 = 32$
Yad rakho: element-wise multiply karo phir sab add karo. Neural network ka weighted sum bilkul yahi hai: $z = \vec{w} \cdot \vec{x} + b$
2
MCQ
Matrix A ka shape (3ร—4) hai aur B ka shape (4ร—2) hai. $A \cdot B$ ka result shape kya hoga?
โœ… Correct! B โ€” (3ร—2)
Rule: $(m \times n) \cdot (n \times p) = (m \times p)$. Outer dimensions rahti hain, inner dimensions cancel hoti hain. $(3 \times 4) \cdot (4 \times 2) = (3 \times 2)$. Inner 4s match karte hain โ†’ multiplication valid hai!
3
Fill in Blank
Gradient Descent update rule mein: $\theta_{new} = \theta_{old}$ __ $\alpha \cdot \nabla J$. Blank mein kya aayega?
$\theta_{new} = \theta_{old}$ $\alpha \cdot \nabla J$
โœ… Answer: โˆ’ (minus)
Hum gradient ke opposite direction mein move karte hain taaki loss minimize ho. Agar gradient positive hai (function upar ja rahi hai), toh hum minus karte hain taaki neeche aayein. Isliye minus sign hai! Agar plus karo toh loss minimize nahi, maximize hoga โ€” galat!
4
Theory
Standard Deviation aur Variance mein kya fark hai? ML mein Standard Deviation zyada use kyun hoti hai?
Answer:

Variance ($\sigma^2$): Average squared deviation from mean. Squaring se units bhi square ho jaate hain (e.g., cmยฒ agar data cm mein tha). Large errors zyada penalize hote hain.

Standard Deviation ($\sigma$): Variance ka square root. Same unit mein hoti hai original data ki tarah โ€” isliye interpret karna asan hota hai!

ML mein std dev zyada use kyun:
โ€ข StandardScaler std dev use karta hai: $z = (x - \mu) / \sigma$
โ€ข Gaussian distribution mein $\sigma$ directly spread batata hai
โ€ข "68% data lies within ยฑ1ฯƒ" โ€” ye std dev se hi possible hai
โ€ข Error bars, confidence intervals sab std dev se

Remember: Variance calculation step hai, Std Dev final answer hai! $\sigma = \sqrt{\sigma^2}$
5
MCQ
Bayes Theorem mein $P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$ โ€” yahan $P(A)$ ko kya kehte hain?
โœ… Correct! B โ€” Prior Probability
Bayes Theorem ke terms:
โ€ข $P(A)$ = Prior โ€” evidence dekhne se pehle ki belief
โ€ข $P(B|A)$ = Likelihood โ€” A given hone par B ka probability
โ€ข $P(B)$ = Evidence / Marginal probability
โ€ข $P(A|B)$ = Posterior โ€” evidence dekhne ke baad updated belief

ML mein: Prior = initial model belief, Posterior = updated belief after seeing data.
Math Quiz Score: 0/5
โœ“
Chapter 1 Complete karo
Next: Chapter 2 โ€” Data Preprocessing & EDA
โ†’