Aim: Implementation of Logistic Regression using sklearn

Program:


from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
diab_df = pd.read_csv("diabetes.csv")
diab_df.head()

# Split dataset into features and target variable
diab_cols = ['Pregnancies', 'Insulin', 'BMI', 'Age', 'Glucose', 'BloodPressure', 'DiabetesPedigreeFunction']

X = diab_df[diab_cols]  # Features
y = diab_df.Outcome  # Target variable

# Splitting Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

# Model Development and Prediction
logreg = LogisticRegression(solver='liblinear')  # Instantiate the model
logreg.fit(X_train, y_train)  # Fit the model with data
y_pred = logreg.predict(X_test)  # Predicting y_pred

# Model Evaluation using Confusion Matrix
cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
cnf_matrix

# Visualizing Confusion Matrix using Heatmap
class_names = [0, 1]  # Name of classes
fig, ax = plt.subplots()

tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)

# Create heatmap
sns.heatmap(pd.DataFrame(cnf_matrix), annot=True, cmap="YlGnBu", fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion Matrix', y=1.1)
plt.ylabel('Actual Label')
plt.xlabel('Predicted Label')

# Confusion Matrix Evaluation Metrics
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
print("Precision:", metrics.precision_score(y_test, y_pred))
print("Recall:", metrics.recall_score(y_test, y_pred))

Output:

Accuracy: 0.8072916666666666

Precision: 0.7659574468085106

Recall: 0.5806451612903226

Sample Viva Questions:

1.What is classification?

Classification is a supervised machine learning problem of predicting which category or class a particular observation belongs to based on its features.Some examples of classification algorithms:

Logistic regression
Decision trees
Random forest
Artificial neural networks
XGBoost

2. What is Logistic regression?

Logistic regression is a supervised classification model known as the logit model. It estimates the probability of something occurring, like ‘will buy’ or ‘will not buy,’ based on a dataset of independent variables. The outcome should be a categorical or a discrete value. The outcome can be either a 0 and 1, true and false, yes and no, and so on.

3.Types of logistic regression

So far, we have discussed one type of binary type of logistic regression where the outcome is a 0/1, True/False, and so on. There are two more types:

Multinomial logistic regression: This type of regression has three or more unordered types of dependent variables, such as cats/dogs/donkeys.
Ordinal logistic regression: Has three or more ordered dependent variables such as poor/average/ good or high/medium/average.

4. Assumptions of logistic regression

Logistic regression assumes that:

The response variable is binary or
The observations or independent variables have very little or no
There are no extreme
There is a linear relationship between the predictor variables and the log- odds of the response variable.

5. Logistic regression with Scikit-learn

To implement logistic regression with Scikit-learn, you need to understand the Scikit-learn modeling process and linear regression.

The steps for building a logistic regression include:

Import the packages, classes, and
Load the
Exploratory Data Analysis(EDA).
Transform the data if
Fit the classification
Evaluate the performance