problems-04-Classfication.pdf

(57 KB) Pobierz

Artiﬁcial Neural Networks

Prof. Dr. Sen Cheng

Oct 25, 2021

Problem Set 4: Logistic Regression/ Classiﬁcation

Tutors:

David Kappel (david.kappel@rub.de), Xiangshuai Zeng (xiangshuai.zeng@ini.ruhr-uni-bochum.de)

Further Reading:

Hands-on Machine Learning with Scikit-Learn and TensorFlow, Ch. 4

1. Derive the gradient of the loss function of logistic regression using paper and pencil.

2. Implement a logistic regression model using only elementary programming operations. For your guidance, see

the steps below:

(a) Load the ﬁle ‘04 log regression data.npy’ in numpy. It contains a hypothetical dataset, where the ﬁrst two

columns reﬂect the features: length of current residency and yearly income, and the last column contains

the labels, i.e., whether a bank loan was granted to an individual or not.

(b) Because the scale of the two features differs considerably, it is advised to standardize them before ﬁtting.

Import the

zscore

function from

scipy.stats

to standardize each of the features.

colors to reﬂect whether the loan was granted or not. You may call

matplotlib.pyplot.scatter

twice for

plotting the points belonging to each of the two classes.

(d) Implement the gradient descent algorithm to minimize the loss function. (Don’t forget to set up the bias

terms.)

i. Set the initial parameters

0 or to a small random number.

ii. Run the ﬁtting process for 15,000 epochs and a learning rate

0.002.

3. Once the gradient descent is completed, plot the decision boundary together with the data. Remember that the

decision boundary is given by

p(x)

σ(θ

. If

, the previous condition is equivalent to

4. Plot the loss vs the training epochs. Why does the loss saturate? Why is the asymptotic value not zero?

5. The decision boundary can be shifted by adjusting

to trade off the likelihood of the different errors. Apply

different classiﬁcation thresholds

to obtain the precision-recall curve,

score and ROC curves. Based on

these measures assess the model performance and determine what would be a reasonable classiﬁcation criterion.