ex6.m
you will be using support vector machines (SVMs) with various example 2D datasets.
- Plot Data (in ex6data1.mat)
SVM with Linear Kernel
try using different values of the C parameter with SVMs. Informally, the C parameter is a positive value that controls the penalty for misclassified training examples.
- Plott decision boundary (ex6data1.mat)
Train SVM with RBF Kernel
- Plot Data (in ex6data2.mat)
C: 1, sigma: 0.1
- Plot decision boundary (in ex6data2.mat)
Try different SVM Parameters to train SVM with RBF Kernel
Automatically choose optimal C and sigma based on a cross-validation set.
C list: [0.01 0.03 0.1 0.3 1 3 10 30]
sigma list: [0.01 0.03 0.1 0.3 1 3 10 30]
=> optimal C = 1 and sigma = 0.1
- Plot Data (in ex6data3.mat)
- Plot decision boundary with optimal svm parameters (in ex6data3.mat)
ex6_spam.m
you will be using support vector machines to build a spam classifier.
For the purpose of this exercise, you will only be using the body of the email (excluding the email headers).
- Preprocess sample email (in emailSample1.txt, vocab.txt)
convert each email into a vector of features
Given the vocabulary list, we can now map each word in the preprocessed emails into a list of word indices that contains the index of the word in the vocabulary list.
Lower-casing, Stripping HTML, Normalizing URLs, Normalizing Email Addresses, Normalizing Numbers, Normalizing Dollars, Word Stemming, Removal of non-words
vocabulary list: a list of 1899 words
- Extracte Features from Emails (in emailSample1.txt)
the feature xi ∈ {0, 1} for an email corresponds to whether the i-th word in the dictionary occurs in the email. That is, xi = 1 if the i-th word is in the email and xi = 0 if the i-th word is not present in the email.
- Train Linear SVM for Spam Classification (in spamTrain.mat, spamTest.mat)
train a SVM to classify between spam (y = 1) and non-spam (y = 0) emails.
spamTrain.mat: 4000 training examples of spam and non-spam email
spamTest.mat: 1000 test examples
Trouble shooting:
- error on plotting the decision boundary of SVM with RBF Kernel
Solution:
rewrite visualizeBoundary.m line 21:
=> contour(X1, X2, vals, [1 1], ‘LineColor’, ‘b’);