ex6.m

you will be using support vector machines (SVMs) with various example 2D datasets.

  • Plot Data (in ex6data1.mat)

ex6_plotting_ex6data1.png

SVM with Linear Kernel

try using different values of the C parameter with SVMs. Informally, the C parameter is a positive value that controls the penalty for misclassified training examples.

  • Plott decision boundary (ex6data1.mat)

ex6_plotting_decision_boundary_with_C_1.png

ex6_plotting_decision_boundary_with_C_100.png

Train SVM with RBF Kernel

  • Plot Data (in ex6data2.mat)

ex6_plotting_ex6data2.png

C: 1, sigma: 0.1

  • Plot decision boundary (in ex6data2.mat)

ex6_plotting_decision_boundary_with_rbf_kernel.png

Try different SVM Parameters to train SVM with RBF Kernel

Automatically choose optimal C and sigma based on a cross-validation set.

C list: [0.01 0.03 0.1 0.3 1 3 10 30]

sigma list: [0.01 0.03 0.1 0.3 1 3 10 30]

=> optimal C = 1 and sigma = 0.1

% Octave console output
% C list: [0.01 0.03 0.1 0.3 1 3 10 30]
% sigma list: [0.01 0.03 0.1 0.3 1 3 10 30]
Training ......... Done!
C: 0.010000
sigma: 0.010000
error = 0.56500
====================
Training ........................................ Done!
C: 0.010000
sigma: 0.030000
error = 0.060000
====================
Training ......................................................................
.. Done!
C: 0.010000
sigma: 0.100000
error = 0.045000
====================
(...)
Training ......................................................................
.............................. Done!
C: 30.000000
sigma: 3.000000
error = 0.065000
====================
Training ....................................................... Done!
C: 30.000000
sigma: 10.000000
error = 0.10000
====================
Training .................................................. Done!
C: 30.000000
sigma: 30.000000
error = 0.18000
====================
optimal C = 1.000000 and sigma = 0.100000
Program paused. Press enter to continue.
  • Plot Data (in ex6data3.mat)

ex6_plotting_ex6data3.png

  • Plot decision boundary with optimal svm parameters (in ex6data3.mat)

ex6_plotting_decision_boundary_with_optimal_svm_parameters.png

ex6_spam.m

you will be using support vector machines to build a spam classifier.

For the purpose of this exercise, you will only be using the body of the email (excluding the email headers).

  • Preprocess sample email (in emailSample1.txt, vocab.txt)

convert each email into a vector of features

Given the vocabulary list, we can now map each word in the preprocessed emails into a list of word indices that contains the index of the word in the vocabulary list.

Lower-casing, Stripping HTML, Normalizing URLs, Normalizing Email Addresses, Normalizing Numbers, Normalizing Dollars, Word Stemming, Removal of non-words

vocabulary list: a list of 1899 words

% Octave console output
Preprocessing sample email (emailSample1.txt)
==== Processed Email ====
anyon know how much it cost to host a web portal well it depend on how mani
visitor you re expect thi can be anywher from less than number buck a month
to a coupl of dollarnumb you should checkout httpaddr or perhap amazon ecnumb
if your run someth big to unsubscrib yourself from thi mail list send an
email to emailaddr
=========================
Word Indices:
86 916 794 1077 883 370 1699 790 1822 1831 883 431 1171 794 1002 1893 1364 592 1676 238 162 89 688 945 1663 1120 1062 1699 375 1162 479 1893 1510 799 1182 1237 810 1895 1440 1547 181 1699 1758 1896 688 1676 992 961 1477 71 530 1699 531
Program paused. Press enter to continue.
  • Extracte Features from Emails (in emailSample1.txt)

the feature xi ∈ {0, 1} for an email corresponds to whether the i-th word in the dictionary occurs in the email. That is, xi = 1 if the i-th word is in the email and xi = 0 if the i-th word is not present in the email.

% Octave console output
Extracting features from sample email (emailSample1.txt)
==== Processed Email ====
anyon know how much it cost to host a web portal well it depend on how mani
visitor you re expect thi can be anywher from less than number buck a month
to a coupl of dollarnumb you should checkout httpaddr or perhap amazon ecnumb
if your run someth big to unsubscrib yourself from thi mail list send an
email to emailaddr
=========================
Length of feature vector: 1899
Number of non-zero entries: 45
Program paused. Press enter to continue.
  • Train Linear SVM for Spam Classification (in spamTrain.mat, spamTest.mat)

train a SVM to classify between spam (y = 1) and non-spam (y = 0) emails.

spamTrain.mat: 4000 training examples of spam and non-spam email

spamTest.mat: 1000 test examples

% Octave console output
Training Linear SVM (Spam Classification)
(this may take 1 to 2 minutes) ...
Training ......................................................................
...............................................................................
...............................................................................
..... Done!
Training Accuracy: 99.850000
Evaluating the trained Linear SVM on a test set ...
Test Accuracy: 99.000000

Trouble shooting:

  • error on plotting the decision boundary of SVM with RBF Kernel
% Octave console output
error: set: unknown hggroup property Color
error: called from
__contour__ at line 201 column 5
contour at line 74 column 16
visualizeBoundary at line 21 column 2
ex6 at line 109 column 1

Solution:

rewrite visualizeBoundary.m line 21:

=> contour(X1, X2, vals, [1 1], ‘LineColor’, ‘b’);

related discussions

Reference

黄海广: 斯坦福大学机器学习课程个人笔记完整版

知乎: 机器学习有很多关于核函数的说法,核函数的定义和作用是什么?

Quora: What are Kernels in Machine Learning and SVM?

ex5.m

implement regularized linear regression and use it to study models with different bias-variance properties.

  • Plot Data (in ex5data1.mat)

ex5_plotting_data.png

  • Compute Regularized Linear Regression Cost

lambda: 1, theta: [1 ; 1]

% Octave console output
Cost at theta = [1 ; 1]: 303.993192
(this value should be about 303.993192)
  • Compute Regularized linear regression gradient

lambda: 1, theta: [1 ; 1]

% Octave console output
Gradient at theta = [1 ; 1]: [-15.303016; 598.250744]
(this value should be about [-15.303016; 598.250744])
  • Train linear regression and plot fit over the data

lambda: 0

ex5_trained_linear_regression.png

  • Comput train error and cross validation error for linear regression

lambda: 0

training error: evaluate the training error on the first i training examples (i.e., X(1:i, :) and y(1:i))

cross-validation error: evaluate on the entire cross validation set (Xval and yval).

% Octave console output
Iteration 3 | Cost: 9.860761e-32
Iteration 2 | Cost: 3.286595e+00
Iteration 28 | Cost: 2.842678e+00
Iteration 24 | Cost: 1.315405e+01
Iteration 27 | Cost: 1.944396e+01
Iteration 13 | Cost: 2.009852e+01
Iteration 30 | Cost: 1.817286e+01
Iteration 11 | Cost: 2.260941e+01
Iteration 33 | Cost: 2.326146e+01
Iteration 10 | Cost: 2.431725e+01
Iteration 2 | Cost: 2.237391e+01
# Training Examples Train Error Cross Validation Error
1 0.000000 210.522449
2 0.000000 110.300366
3 3.286595 45.010231
4 2.842678 48.368911
5 13.154049 35.865165
6 19.443963 33.829962
7 20.098522 31.970986
8 18.172859 30.862446
9 22.609405 31.135998
10 23.261462 28.936207
11 24.317250 29.551432
12 22.373906 29.433818
  • Plot learning curve for linear regression

Since the model is underfitting the data, we expect to see a graph with “high bias”

ex5_learning_curve_for_linear_regression.png

  • Map X onto Polynomial Features and Normalize

X_poly(i, :) = [X(i) X(i).^2 X(i).^3 … X(i).^p]

% Octave console output
Normalized Training Example 1:
1.000000
-0.362141
-0.755087
0.182226
-0.706190
0.306618
-0.590878
0.344516
-0.508481
  • Train Polynomial regression and plot fit over the data

ex5_trained_polynomial_regression.png

  • Comput train error and cross validation error for polynomial regression

lambda: 0

training error: evaluate the training error on the first i training examples (i.e., X(1:i, :) and y(1:i))

cross-validation error: evaluate on the entire cross validation set (Xval and yval).

% Octave console output
Iteration 14 | Cost: 1.232595e-32
Iteration 25 | Cost: 4.108651e-32
Iteration 11 | Cost: 3.910038e-28
Iteration 200 | Cost: 5.989594e-08
Iteration 200 | Cost: 8.797460e-04
warning: division by zero.107198e+01
Iteration 200 | Cost: 4.639732e-02
Iteration 200 | Cost: 6.939729e-02
warning: division by zero.922449e+01
Iteration 200 | Cost: 1.814619e-01
Iteration 200 | Cost: 1.626512e-01
Iteration 200 | Cost: 1.240625e-01
Iteration 200 | Cost: 1.354165e-01
Polynomial Regression (lambda = 0.000000)
# Training Examples Train Error Cross Validation Error
1 0.000000 331.806752
2 0.000000 160.121510
3 0.000000 61.754825
4 0.000000 61.928895
5 0.000000 6.604738
6 0.000880 10.065414
7 0.046397 7.260759
8 0.069397 7.098868
9 0.181462 7.725792
10 0.162651 8.719869
11 0.124062 9.822221
12 0.135417 12.486147
  • Plot learning curve for polynomial regression

Since the model is overfitting the data, we expect to see a graph with “high variance”

ex5_learning_curve_for_polynomial_regression.png

  • Test various values of lambda and compute error
Iteration 200 | Cost: 1.812655e-01
Iteration 200 | Cost: 1.902681e-01
Iteration 200 | Cost: 2.527827e-01
Iteration 200 | Cost: 3.850725e-01
Iteration 200 | Cost: 6.692749e-01
Iteration 186 | Cost: 1.443470e+00
Iteration 111 | Cost: 3.101591e+00
Iteration 61 | Cost: 7.268148e+00
Iteration 33 | Cost: 1.586769e+01
Iteration 20 | Cost: 3.337220e+01
lambda Train Error Validation Error
0.000000 0.181265 22.664199
0.001000 0.158597 18.165177
0.003000 0.187127 19.029032
0.010000 0.221858 17.059909
0.030000 0.281862 12.829269
0.100000 0.459318 7.587014
0.300000 0.921760 4.636833
1.000000 2.076188 4.260625
3.000000 4.901351 3.822907
10.000000 16.092213 9.945509
  • Plot validation curve

use validation curve to select the “best” lambda value

the best value of lambda is around 3

ex5_validation_curve_for_polynomial_regression.png

ex4.m

implement the backpropagation algorithm for neural networks and apply it to the task of hand-written digit recognition.

  • Plot Data (in ex4data1.mat)

ex4_plotting_data.png

  • Feedforward Using Neural Network and Compute Cost at parameters (loaded from ex4weights)
% Octave console output
Feedforward Using Neural Network ...
Cost at parameters (loaded from ex4weights): 0.287629
(this value should be about 0.287629)
  • Cost function with regularization

lambda: 1

% Octave console output
Checking Cost Function (w/ Regularization) ...
Cost at parameters (loaded from ex4weights): 0.383770
(this value should be about 0.383770)
  • Random initialization weights

symmetry breaking: initialize weights

Theta(j, i) = RABD_NUM*(2*INIT_EPSILON) - INIT_EPSILON

RABD_NUM: between 0 to 1

  • Complete backpropagation and check Neural Network Gradients

generate some ‘random’ test data and test

input_layer_size: 3

hidden_layer_size: 5

num_labels: 3

m: 5

% Octave console output
Initializing Neural Network Parameters ...
Checking Backpropagation...
-9.2783e-03 -9.2783e-03
8.8991e-03 8.8991e-03
-8.3601e-03 -8.3601e-03
7.6281e-03 7.6281e-03
-6.7480e-03 -6.7480e-03
-3.0498e-06 -3.0498e-06
1.4287e-05 1.4287e-05
-2.5938e-05 -2.5938e-05
3.6988e-05 3.6988e-05
-4.6876e-05 -4.6876e-05
-1.7506e-04 -1.7506e-04
2.3315e-04 2.3315e-04
-2.8747e-04 -2.8747e-04
3.3532e-04 3.3532e-04
-3.7622e-04 -3.7622e-04
-9.6266e-05 -9.6266e-05
1.1798e-04 1.1798e-04
-1.3715e-04 -1.3715e-04
1.5325e-04 1.5325e-04
-1.6656e-04 -1.6656e-04
3.1454e-01 3.1454e-01
1.1106e-01 1.1106e-01
9.7401e-02 9.7401e-02
1.6409e-01 1.6409e-01
5.7574e-02 5.7574e-02
5.0458e-02 5.0458e-02
1.6457e-01 1.6457e-01
5.7787e-02 5.7787e-02
5.0753e-02 5.0753e-02
1.5834e-01 1.5834e-01
5.5924e-02 5.5924e-02
4.9162e-02 4.9162e-02
1.5113e-01 1.5113e-01
5.3697e-02 5.3697e-02
4.7146e-02 4.7146e-02
1.4957e-01 1.4957e-01
5.3154e-02 5.3154e-02
4.6560e-02 4.6560e-02
The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)
If your backpropagation implementation is correct, then
the relative difference will be small (less than 1e-9).
Relative Difference: 2.48422e-11
  • Regularized Neural Networks

lambda: 3

% Octave console output
Cost at (fixed) debugging parameters (w/ lambda = 3): 0.576051
(this value should be about 0.576051)
  • Training Neural Network

lambda: 1

% Octave console output
Training Neural Network...
Iteration 50 | Cost: 4.751548e-01
  • Visualizing Weights

displaying the hidden units to see what features they are capturing in the data.

displaying images of Theta1

ex4_visualizing_nn.png

ex3.m

implement one-vs-all logistic regression and neural networks to recognize hand-written digits.

  • Plot Data (in ex3data1.mat)

ex3_plotting_data.png

  • Training One-vs-All Logistic Regression

hypothesis function: 1./(1+e.^(-1(Xtheta)))

K = 10 (0 to 9)

Iteration: 50

% Octave console output
Training One-vs-All Logistic Regression...
Iteration 50 | Cost: 1.375603e-02 % k = 1
Iteration 50 | Cost: 5.725232e-02 % k = 2
Iteration 50 | Cost: 6.419917e-02 % k = 3
Iteration 50 | Cost: 3.576346e-02 % k = 4
Iteration 50 | Cost: 6.183236e-02 % k = 5
Iteration 50 | Cost: 2.121825e-02 % k = 6
Iteration 50 | Cost: 3.489292e-02 % k = 7
Iteration 50 | Cost: 8.559999e-02 % k = 8
Iteration 50 | Cost: 7.877348e-02 % k = 9
Iteration 50 | Cost: 9.719041e-03 % k = 0
  • Predict for One-Vs-All
% Octave console output
Training Set Accuracy: 95.020000

ex3_nn.m

implement a neural network to recognize handwritten digits using the same training set as before.

provided with a set of network parameters (Θ(1),Θ(2)) already trained (in ex3weights.mat)

  • Feedforward Propagation and Prediction

Loading Saved Neural Network Parameters in ex3weights.mat

% Octave console output
Training Set Accuracy: 97.520000

ex2.m

implement logistic regression and apply it to two different datasets (ex2data1.txt, ex2data2.txt)

  • Plot Data (in ex2data1.txt)

ex2_plotting_data.png

  • Compute Cost and Gradient

hypothesis function: 1./(1+e.^(-1(Xtheta)))

% Octave console output
Cost at initial theta (zeros): 0.693147
Gradient at initial theta (zeros):
-0.100000
-12.009217
-11.262842
  • Learning parameters using fminunc

initial theta: zeros, iteration: 400

% Octave console output
Cost at theta found by fminunc: 0.203498
theta:
-25.161272
0.206233
0.201470
  • Plot Decision Boundary

ex2_plotting_decisionBoundary.png

  • Predict and Accuracies

use the logistic regression model to predict the probability that a student with score 45 on exam 1 and score 85 on exam 2 will be admitted.

% Octave console output
For a student with scores 45 and 85, we predict an admission probability of 0.776289
Train Accuracy: 89.000000

ex2_reg.m

The axes are the two test scores, and the positive (y = 1, accepted) and negative (y = 0, rejected) examples are shown with different markers.

  • Plot Data (in ex2data2.txt)

ex2_reg_plotting_data.png

  • Add Polynomial Features and Compute Cost

original X: [X1 X2]

mapFeatured X: [X1 X2 (X1.^2) (X2.^2) (X1X2) (X1X2.^2) …]

hypothesis function: 1./(1+e.^(-1(Xtheta)))

% Octave console output
Cost at initial theta (zeros): 0.693147
  • Plot Decision Boundary with lambda 0

ex2_reg_plotting_decisionBoundary_with_lambda_0.png

  • Plot Decision Boundary with lambda 1

ex2_reg_plotting_decisionBoundary_with_lambda_1

  • Plot Decision Boundary with lambda 100

ex2_reg_plotting_decisionBoundary_with_lambda_100