Menu
Mini-Project: Performance Analysis of Classification Algorithms
Objective:
Compare different classification models (Logistic Regression, Decision Tree, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM)) on the Breast Cancer dataset and analyze their accuracy.
✅ Step-by-Step Algorithm
Step 1: Load the Dataset
Use the Breast Cancer dataset from
sklearn.datasets
.Extract features (
X
) and target labels (y
).
Step 2: Preprocess the Data
Split the dataset into training (80%) and testing (20%) sets.
Normalize features using StandardScaler.
Step 3: Train Classification Models
Implement the following models:
Logistic Regression
Decision Tree
K-Nearest Neighbors (KNN)
Support Vector Machine (SVM)
Step 4: Evaluate the Models
Use Accuracy Score to measure performance.
Generate a Classification Report with Precision, Recall, F1-score.
Step 5: Visualize the Results
Create a bar chart comparing model accuracies using
Matplotlib
andSeaborn
.
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
# Step 1: Load the Dataset
data = load_breast_cancer()
X, y = data.data, data.target
# Step 2: Split Dataset into Training and Testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 3: Normalize the Data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Step 4: Train Classification Models
models = {
"Logistic Regression": LogisticRegression(),
"Decision Tree": DecisionTreeClassifier(),
"KNN": KNeighborsClassifier(),
"SVM": SVC()
}
results = {} # Store accuracy results
for name, model in models.items():
model.fit(X_train, y_train) # Train the model
y_pred = model.predict(X_test) # Predict on test data
accuracy = accuracy_score(y_test, y_pred) # Evaluate accuracy
results[name] = accuracy # Store accuracy
print(f"{name}:\n{classification_report(y_test, y_pred)}\n") # Show classification report
# Step 5: Visualize Performance Comparison
df_results = pd.DataFrame(list(results.items()), columns=['Model', 'Accuracy'])
plt.figure(figsize=(8, 5))
sns.barplot(x='Model', y='Accuracy', data=df_results, palette='coolwarm')
plt.title("Performance Comparison of Classification Models")
plt.xlabel("Classification Model")
plt.ylabel("Accuracy Score")
plt.ylim(0, 1)
plt.show()
Expected Output:
📌 Accuracy Scores for Each Model (Example)
Model | Accuracy |
---|---|
Logistic Regression | 97.36% |
Decision Tree | 94.73% |
K-Nearest Neighbors | 95.79% |
Support Vector Machine | 97.89% |