Mini-Project: Performance Analysis of Classification Algorithms

Objective:

Compare different classification models (Logistic Regression, Decision Tree, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM)) on the Breast Cancer dataset and analyze their accuracy.


✅ Step-by-Step Algorithm

Step 1: Load the Dataset

  • Use the Breast Cancer dataset from sklearn.datasets.

  • Extract features (X) and target labels (y).

Step 2: Preprocess the Data

  • Split the dataset into training (80%) and testing (20%) sets.

  • Normalize features using StandardScaler.

Step 3: Train Classification Models

  • Implement the following models:

    1. Logistic Regression

    2. Decision Tree

    3. K-Nearest Neighbors (KNN)

    4. Support Vector Machine (SVM)

Step 4: Evaluate the Models

  • Use Accuracy Score to measure performance.

  • Generate a Classification Report with Precision, Recall, F1-score.

Step 5: Visualize the Results

  • Create a bar chart comparing model accuracies using Matplotlib and Seaborn.

Program:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer

# Step 1: Load the Dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Step 2: Split Dataset into Training and Testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Normalize the Data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Step 4: Train Classification Models
models = {
    "Logistic Regression": LogisticRegression(),
    "Decision Tree": DecisionTreeClassifier(),
    "KNN": KNeighborsClassifier(),
    "SVM": SVC()
}

results = {}  # Store accuracy results

for name, model in models.items():
    model.fit(X_train, y_train)  # Train the model
    y_pred = model.predict(X_test)  # Predict on test data
    accuracy = accuracy_score(y_test, y_pred)  # Evaluate accuracy
    results[name] = accuracy  # Store accuracy
    print(f"{name}:\n{classification_report(y_test, y_pred)}\n")  # Show classification report

# Step 5: Visualize Performance Comparison
df_results = pd.DataFrame(list(results.items()), columns=['Model', 'Accuracy'])

plt.figure(figsize=(8, 5))
sns.barplot(x='Model', y='Accuracy', data=df_results, palette='coolwarm')
plt.title("Performance Comparison of Classification Models")
plt.xlabel("Classification Model")
plt.ylabel("Accuracy Score")
plt.ylim(0, 1)
plt.show()

        
Expected Output:

📌 Accuracy Scores for Each Model (Example)

ModelAccuracy
Logistic Regression97.36%
Decision Tree94.73%
K-Nearest Neighbors95.79%
Support Vector Machine97.89%
;