03-Regression-problems.pdf

(75 KB) Pobierz

Artiﬁcial Neural Networks

Prof. Dr. Sen Cheng

Oct 18, 2021

Problem Set 3: Regression

Tutors:

Sandhiya Vijayabaskaran (sandhiya.vijayabaskaran@rub.de), Behnam Ghazinouri (behnam.ghazinouri@rub.de)

Note:

All excercises should be done in python.

Analytical solution of linear regression

Although in real-world problems you do not know the function that generated the dataset, here we assume that

we know the generating function, which is usually called the “ground-truth”. Let

be the ground-truth function, where

is a uniform random variable.

(a) Generate a vector

containing 100 linearly spaced numbers from -3 to 3 (use

numpy.linspace)

and a

vector

of 100 random numbers between

−0.5

and 0.5 (use

numpy.random.uniform).

Then compute

the vector

according to Eq. 1 and generate a scatter plot of

against

(b) Use univariate linear regression (see lecture notes, section 3.1) to ﬁnd the parameters

and

that min-

imize the

−

norm

loss function (or summed squared errors). Implement the equations yourself using

only elementary operations.

ﬁnd

θ.

Implement the equations yourself using only elementary matrix operations. Compare your solution

with the solution in problem 1b.

(d) Compare the estimated parameters to the ground-truth. Visualize the goodness of ﬁt by plotting the regres-

sion line together with the data points. Also, compute the explained variance, and generate the residual

plot.

(e) Use the scikit-learn python package to perform linear regression on the data from (a). From

sklearn

import

linear_model

and use the method

LinearRegression.

Fit the linear model to the data generated

in the ﬁrst step. Look at the instructions provided by

help(linear_model.LinearRegression)

for

help. Calculate the explained variance

using

sklearn.metrics.r2_score.

Polynomial regression

Given the stochastic polynomial function

−x

1.5x

−

ε,

where

−2

2 and

is a random number uniformly distributed between -1 and 1.

(a) Generate

, an array consisting of 100 data points.

(b) Fit a 5th-order polynomial to

, and determine its coefﬁcients

θ.

Plot the data points and the ﬁtted curve

on a single plot.

(2)

(1)

θ,

you estimated to the ground-truth,

θ,

by using the

-norm as the error mea-

sure. Check how the error of the estimated parameters varies with the number of data points by plotting

the error in the estimates versus the number of data points N, for

∈

[1,

100].

Incremental version of linear regression

Often in practice, datasets can be extremely large, which makes it infeasible to use the analytical solution. Use

stochastic gradient descent (SGD) to compute the regression parameters as follows:

(a) Write a Python function that performs stochastic gradient descent. Use the

-norm as the loss function

and assume your data can be modelled using multiple linear regression. Only use elementary operations.

(Hint: don’t forget to shufﬂe the data before iterating through it!)

(b) Use

as the input for your stochastic gradient descent function to test it.

a single graph. Check how varying the number of data points (N

∈

[5,

500]) affects the estimates by

generating a plot as in Problem 2c. Use a constant learning rate of 0.01 and use 3 runs of SGD. Next, keep

the number of data points constant at 100 and use 3 epochs of SGD, and vary the learning rate in steps of

0.001 up to a range of 0.4. Generate a similar plot of error vs. learning rate.

(3)

Plik z chomika:

mxp-pl

03-Regression-problems.pdf

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: