How to Build a Linear Regression Model from Scratch Using Only NumPy and Matplotlib

Machine learning Mathematics in Machine Learning

Posted by tintin_2003 on 2025-10-17 19:41:44 |

Share: Facebook | Twitter | Whatsapp | Linkedin Visits: 57

How to Build a Linear Regression Model from Scratch Using Only NumPy and Matplotlib

Linear regression is one of the fundamental algorithms in machine learning and statistics. While libraries like scikit-learn make it easy to use, understanding how to build it from scratch gives you deeper insights into how the algorithm actually works. In this tutorial, we'll implement linear regression using only NumPy for computation and Matplotlib for visualization.

What is Linear Regression?

Linear regression is a supervised learning algorithm that models the relationship between a dependent variable (y) and one or more independent variables (X) by fitting a linear equation to observed data. The simplest form, simple linear regression, uses just one independent variable:

y = mx + b

Where:

y is the dependent variable (what we're trying to predict)
x is the independent variable (the feature)
m is the slope of the line
b is the y-intercept

In machine learning terminology, we often write this as:

y = w₁x + w₀

Where w₁ is the weight (slope) and w₀ is the bias (intercept).

The Mathematics Behind Linear Regression

Cost Function

To find the best-fitting line, we need to minimize the error between our predictions and actual values. We use the Mean Squared Error (MSE) as our cost function:

J(w₀, w₁) = (1/2m) Σ(ŷᵢ - yᵢ)²

Where:

m is the number of training examples
ŷᵢ is the predicted value
yᵢ is the actual value

Gradient Descent

To minimize the cost function, we use gradient descent. This iterative optimization algorithm updates our parameters in the direction that reduces the cost:

w₁ := w₁ - α * ∂J/∂w₁
w₀ := w₀ - α * ∂J/∂w₀

Where α is the learning rate.

The partial derivatives are:

∂J/∂w₁ = (1/m) Σ(ŷᵢ - yᵢ) * xᵢ
∂J/∂w₀ = (1/m) Σ(ŷᵢ - yᵢ)

Implementation from Scratch

Let's implement this step by step using only NumPy and Matplotlib.

Step 1: Import Libraries

import numpy as np
import matplotlib.pyplot as plt

Step 2: Generate Sample Data

For this tutorial, we'll create synthetic data with a linear relationship plus some noise:

# Set random seed for reproducibility
np.random.seed(42)

# Generate 100 random points
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Visualize the data
plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.6)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Sample Dataset')
plt.grid(True, alpha=0.3)
plt.show()

This generates data following the pattern y = 4 + 3x + noise, so our model should learn weights close to these values.

Step 3: Create the Linear Regression Class

class LinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        """
        Initialize the Linear Regression model
        
        Parameters:
        learning_rate: float, step size for gradient descent
        n_iterations: int, number of iterations for training
        """
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.bias = None
        self.cost_history = []
    
    def fit(self, X, y):
        """
        Train the model using gradient descent
        
        Parameters:
        X: numpy array of shape (m, n) - training features
        y: numpy array of shape (m, 1) - target values
        """
        # Get number of samples and features
        m, n = X.shape
        
        # Initialize parameters
        self.weights = np.zeros((n, 1))
        self.bias = 0
        
        # Gradient descent
        for i in range(self.n_iterations):
            # Forward pass: make predictions
            y_pred = self.predict(X)
            
            # Calculate cost
            cost = (1 / (2 * m)) * np.sum((y_pred - y) ** 2)
            self.cost_history.append(cost)
            
            # Calculate gradients
            dw = (1 / m) * np.dot(X.T, (y_pred - y))
            db = (1 / m) * np.sum(y_pred - y)
            
            # Update parameters
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db
            
            # Print cost every 100 iterations
            if i % 100 == 0:
                print(f"Iteration {i}: Cost = {cost:.4f}")
    
    def predict(self, X):
        """
        Make predictions using the learned parameters
        
        Parameters:
        X: numpy array of shape (m, n) - features
        
        Returns:
        predictions: numpy array of shape (m, 1)
        """
        return np.dot(X, self.weights) + self.bias
    
    def get_params(self):
        """
        Get the learned parameters
        
        Returns:
        weights, bias
        """
        return self.weights, self.bias

Step 4: Train the Model

# Create and train the model
model = LinearRegression(learning_rate=0.1, n_iterations=1000)
model.fit(X, y)

# Get the learned parameters
weights, bias = model.get_params()
print(f"\nLearned Parameters:")
print(f"Weight (slope): {weights[0][0]:.4f}")
print(f"Bias (intercept): {bias:.4f}")

Step 5: Visualize Results

# Make predictions
y_pred = model.predict(X)

# Plot the results
plt.figure(figsize=(15, 5))

# Subplot 1: Data and fitted line
plt.subplot(1, 3, 1)
plt.scatter(X, y, alpha=0.6, label='Actual Data')
plt.plot(X, y_pred, color='red', linewidth=2, label='Fitted Line')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression: Fitted Line')
plt.legend()
plt.grid(True, alpha=0.3)

# Subplot 2: Cost function over iterations
plt.subplot(1, 3, 2)
plt.plot(model.cost_history, linewidth=2)
plt.xlabel('Iteration')
plt.ylabel('Cost (MSE)')
plt.title('Cost Function Convergence')
plt.grid(True, alpha=0.3)

# Subplot 3: Residuals
residuals = y - y_pred
plt.subplot(1, 3, 3)
plt.scatter(y_pred, residuals, alpha=0.6)
plt.axhline(y=0, color='red', linestyle='--', linewidth=2)
plt.xlabel('Predicted Values')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Step 6: Evaluate the Model

# Calculate R² score (coefficient of determination)
ss_total = np.sum((y - np.mean(y)) ** 2)
ss_residual = np.sum((y - y_pred) ** 2)
r2_score = 1 - (ss_residual / ss_total)

# Calculate Mean Squared Error
mse = np.mean((y - y_pred) ** 2)

# Calculate Root Mean Squared Error
rmse = np.sqrt(mse)

print(f"\nModel Evaluation Metrics:")
print(f"R² Score: {r2_score:.4f}")
print(f"Mean Squared Error: {mse:.4f}")
print(f"Root Mean Squared Error: {rmse:.4f}")

Complete Working Example

Here's the complete code you can run:

import numpy as np
import matplotlib.pyplot as plt

# Generate sample data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Linear Regression Class
class LinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.bias = None
        self.cost_history = []
    
    def fit(self, X, y):
        m, n = X.shape
        self.weights = np.zeros((n, 1))
        self.bias = 0
        
        for i in range(self.n_iterations):
            y_pred = self.predict(X)
            cost = (1 / (2 * m)) * np.sum((y_pred - y) ** 2)
            self.cost_history.append(cost)
            
            dw = (1 / m) * np.dot(X.T, (y_pred - y))
            db = (1 / m) * np.sum(y_pred - y)
            
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db
    
    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

# Train the model
model = LinearRegression(learning_rate=0.1, n_iterations=1000)
model.fit(X, y)

# Make predictions and visualize
y_pred = model.predict(X)

plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.6, label='Actual Data')
plt.plot(X, y_pred, color='red', linewidth=2, label='Fitted Line')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression from Scratch')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"Weight: {model.weights[0][0]:.4f}")
print(f"Bias: {model.bias:.4f}")

Key Concepts Explained

Learning Rate

The learning rate controls how big the steps are in gradient descent. Too large, and the algorithm might overshoot the minimum. Too small, and training will be very slow. Typical values range from 0.001 to 0.1.

Convergence

Watch the cost function plot. If it decreases smoothly and plateaus, your model has converged. If it oscillates or increases, reduce the learning rate.

Vectorization

Notice how we use NumPy's matrix operations instead of loops. This makes the code faster and more efficient:

# Vectorized (fast)
y_pred = np.dot(X, self.weights) + self.bias

# Loop-based (slow)
# for i in range(len(X)):
#     y_pred[i] = X[i] * self.weights + self.bias

Extending to Multiple Features

The beauty of this implementation is that it already handles multiple features! Just pass in X with shape (m, n) where n is the number of features:

# Generate data with 3 features
X_multi = np.random.rand(100, 3)
y_multi = 2 + 3*X_multi[:, 0] + 5*X_multi[:, 1] - 2*X_multi[:, 2] + np.random.randn(100)
y_multi = y_multi.reshape(-1, 1)

# Train the model
model_multi = LinearRegression(learning_rate=0.1, n_iterations=1000)
model_multi.fit(X_multi, y_multi)

print("Weights for each feature:", model_multi.weights.flatten())

Common Issues and Solutions

1. Cost Function Not Decreasing

Solution: Reduce the learning rate
Check for bugs in gradient calculation

2. Slow Convergence

Solution: Increase the learning rate (carefully)
Feature scaling can help significantly

3. Numerical Instability

Solution: Normalize your features
Use feature scaling: X_scaled = (X - X.mean()) / X.std()

Conclusion

You've now built a linear regression model from scratch using only NumPy and Matplotlib! This implementation helps you understand:

How gradient descent optimizes parameters
The role of the cost function
How predictions are made using the learned parameters
The mathematics behind one of ML's most fundamental algorithms

While libraries like scikit-learn are more efficient and feature-rich for production use, building algorithms from scratch is invaluable for developing intuition and debugging skills.

Next Steps

Try experimenting with:

Different learning rates and iteration counts
Adding feature scaling/normalization
Implementing regularization (Ridge or Lasso)
Using the normal equation as an alternative to gradient descent
Adding polynomial features for non-linear relationships

Happy coding!