【作业四】林轩田机器学习技法 + 机器学习公开新课学习个人体会-白红宇

【作业四】林轩田机器学习技法 + 机器学习公开新课学习个人体会

阅读量：6796 次

发布时间：2019-06-26

本文共 14122 字，大约阅读时间需要 47 分钟。

这次作业的coding任务量比较大，总的来说需要实现neural network, knn, kmeans三种模型。

Q11~Q14为Neural Network的题目，我用单线程实现的，运行的时间比较长，因此把这几道题的正确答案记录如下：

Q11: 6

Q12: 0.001

Q13: 0.01

Q14: 0.02 ≤ Eout ≤ 0.04

其中Q11和Q14的答案比较明显，Q12和Q13有两个答案比较接近（参考了讨论区的内容，最终也调出来了）

neural network的代码实现思路如下：

1）实现权重矩阵W初始化（def init_W(nnet_struct, w_range))

2）实现计算每一轮神经元输出的函数，即bp算法中的forward过程(def forward_process(x, y, W))

3）实现计算每一轮output error对于每个神经元输入score的导数，即bp算法中的backward过程(def backward_process(x, y, neuron_output, W))

4）利用梯度下降方法，更新各层权重矩阵W的函数(def update_W_withGD(x, neuron_output, gradient, W, ita))

其中最难的是步骤3），要想实现矩阵化编程，需要对神经网络的每层结构熟练，同时对于你使用的编程语言的矩阵化操作要非常熟悉；自己在这个方面比较欠缺，还得是熟能生巧。

>>自己第一次写NNet的算法，从单隐层（隐层个数2）开始调试的：按照模块1）2）3）4）的顺序，各个模块调试；循序渐进的调试速度比较慢，但模块质量高一些，后面的联合调试就省事一些。

>>如果是特别复杂的网络，如何对这种gradient的算法进行调试呢？因为gradient各个点的gradient几乎是不可能都算到的，在网上查了gradient checking方法：

>>NNet的调参真的很重要，就Q14来说，即使是hidden units的总个数一样，如果每层的个数不同，最后的结果也是有差别的（我第一次比较粗心，把NNet的结构按照 3 8 1这样了，发现结果没有 8 3 1这样好），后面多搜搜调参相关的资料积累一下。

代码如下（没有把调试的代码删掉，可以记录调试的经过，同时也防止以后犯类似的错误），确实乱了一些，请看官包涵了：

#encoding=utf8import sysimport numpy as npimport mathfrom random import *### read data from local file# return with numpy arraydef read_input_data(path):    x = []    y = []    for line in open(path).readlines():        if line.strip()=='': continue        items = line.strip().split(' ')        tmp_x = []        for i in range(0,len(items)-1): tmp_x.append(float(items[i]))        x.append(tmp_x)        y.append(float(items[-1]))    return np.array(x),np.array(y)## # initialize weight matrix# input neural network structure & initilizing uniform value range (both low and high)# each layer's bias need to be added# return with inialized Wdef init_W(nnet_struct, w_range):    W = []    for i in range(1,len(nnet_struct)):        tmp_w = np.random.uniform(w_range['low'], w_range['high'], (nnet_struct[i-1]+1,nnet_struct[i]) )        W.append(tmp_w)    return W## # randomly pick sample from raw data for Stochastic Gradient Descent# T indicates the iterative numbers# return with data for each SGD iterationdef pick_SGD_data(x, y, T):    sgd_x = np.zeros((T,x.shape[1]))    sgd_y = np.zeros(T)    for i in range(T):        index = randint(0, x.shape[0]-1)        sgd_x[i] = x[index]        sgd_y[i] = y[index]    return sgd_x, sgd_y## # forward process# calculate each neuron's outputdef forward_process(x, y, W):    ret = []    #print W[0].shape    #print W[1].shape    pre_x = np.hstack((1,x))    for i in range(len(W)):        pre_x = np.tanh(np.dot(pre_x, W[i]))        ret.append(pre_x)        pre_x = np.hstack((1,pre_x))    return ret### backward process# calcultae the gradient of error and each neuron's input scoredef backward_process(x, y, neuron_output, W):    ret = []    L = len(neuron_output)    # print neuron_output[0].shape, neuron_output[1].shape    # Output layer    score = np.dot( np.hstack((1, neuron_output[L-2])), W[L-1])    # print score    # print score.shape    gradient = np.array( [-2 * (y-neuron_output[L-1][0]) * tanh_gradient(score)] )    # print gradient    # print gradient.shape    ret.insert(0, gradient)    # Hidden layer     for i in range(L-2,-1,-1):        if i==0:            score = np.dot(np.hstack((1, x)),W[i])            # print score.shape            # print gradient.shape            # print W[1][1:].transpose().shape            # print score            gradient = np.dot(gradient, W[1][1:].transpose()) * tanh_gradient(score)            # print gradient            # print gradient.shapeq            ret.insert(0, gradient)        else:            score = np.dot(np.hstack((1,neuron_output[i-1])),W[i])            # print score.shape            # print gradient.shape            # print W[i+1][1:].transpose().shape            # print "......"            gradient = np.dot(gradient , W[i+1][1:].transpose()) * tanh_gradient(score)            # print gradient.shape            # print "======"            ret.insert(0, gradient)    return ret# give a numpy array# boardcast tanh gradient to each elementdef tanh_gradient(s):    ret = np.zeros(s.shape)    for i in range(s.shape[0]):        ret[i] = 4.000001 / (math.exp(2*s[i])+math.exp(-2*s[i])+2)    return ret### update W with Gradient Descentdef update_W_withGD(x, neuron_output, gradient, W, ita):    ret = []    L = len(W)    # print "L:"+str(L)    # print neuron_output[0].shape, neuron_output[1].shape    # print gradient[0].shape, gradient[1].shape    # print W[0].shape, W[1].shape    # print np.hstack((1,x)).transpose().shape    # print gradient[0].shape    ret.append( W[0] - ita * np.array([np.hstack((1,x))]).transpose() * gradient[0] )    for i in range(1, L, 1):        ret.append( W[i] - ita * np.array([np.hstack((1,neuron_output[i-1]))]).transpose() * gradient[i] )    # print len(ret)    return ret## # calculate Eoutdef calculate_E(W, path):    x,y = read_input_data(path)    error_count = 0    for i in range(x.shape[0]):        if predict(x[i],y[i],W):            error_count += 1    return 1.000001*error_count/x.shape[0]def predict(x, y, W):    y_predict = x    for i in range(0, len(W), 1):        y_predict = np.tanh( np.dot( np.hstack((1,y_predict)), W[i] ) )    y_predict = 1 if y_predict>0 else -1    return y_predict!=y### Q11def Q11(x,y):    R = 20 # repeat time    Ms = { 6, 16 } # hidden units    M_lowests = {}    for M in Ms: M_lowests[M] = 0    for r in range(R):        T = 50000        ita = 0.1        min_M = -1        E_min = float("inf")        for M in Ms:            sgd_x, sgd_y = pick_SGD_data(x, y, T)            nnet_struct = [ x.shape[1], M, 1 ]            # print nnet_struct            w_range = {}            w_range['low'] = -0.1            w_range['high'] = 0.1            W = init_W(nnet_struct, w_range)            # for i in range(len(W)):            #    print W[i]            # print sgd_x,sgd_y            for t in range(T):                neuron_output = forward_process(sgd_x[t], sgd_y[t], W)                # print sgd_x[t],sgd_y[t]                # print W                # print neuron_output                error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)                # print error_neuronInputScore_gradient                W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)            E = calculate_E(W,"test.dat")            # print str(r)+":::"+str(M)+":"+str(E)            M_lowests[M] += E    for k,v in M_lowests.items():        print str(k)+":"+str(v)### Q12def Q12(x,y):    ita = 0.1    M = 3    nnet_struct = [ x.shape[1], M, 1 ]    Rs = { 0.001, 0.1 }    R_lowests = {}    for R in Rs: R_lowests[R] = 0    N = 40    T = 30000    for i in range(N):        for R in Rs:            sgd_x, sgd_y = pick_SGD_data(x, y, T)            w_range = {}            w_range['low'] = -1*R            w_range['high'] = R            W = init_W(nnet_struct, w_range)            for t in range(T):                neuron_output = forward_process(sgd_x[t], sgd_y[t], W)                error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)                W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)            E = calculate_E(W, "test.dat")            print str(R)+":"+str(E)            R_lowests[R] += E    for k,v in R_lowests.items():        print str(k)+":"+str(v)## # Q13def Q13(x,y):    M = 3    nnet_struct = [ x.shape[1], M, 1 ]    itas = {0.001,0.01,0.1}    ita_lowests = {}    for ita in itas: ita_lowests[ita] = 0    N = 20    T = 20000    for i in range(N):        for ita in itas:            sgd_x, sgd_y = pick_SGD_data(x, y, T)            w_range = {}            w_range['low'] = -0.1            w_range['high'] = 0.1            W = init_W(nnet_struct, w_range)            for t in range(T):                neuron_output = forward_process(sgd_x[t], sgd_y[t], W)                error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)                W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)            E = calculate_E(W, "test.dat")            print str(ita)+":"+str(E)            ita_lowests[ita] += E    for k,v in ita_lowests.items():        print str(k)+":"+str(v)### Q14def Q14(x,y):    T = 50000    ita = 0.01    E_total = 0    R = 10    for i in range(R):        nnet_struct = [ x.shape[1], 8, 3, 1 ]        w_range = {}        w_range['low'] = -0.1        w_range['high'] = 0.1        W = init_W(nnet_struct, w_range)        sgd_x, sgd_y = pick_SGD_data(x, y, T)        for t in range(T):            neuron_output = forward_process(sgd_x[t], sgd_y[t], W)            error_neuronInputScore_gradient = backward_process(sgd_x[t], sgd_y[t], neuron_output, W)            W = update_W_withGD(sgd_x[t], neuron_output, error_neuronInputScore_gradient, W, ita)            E = calculate_E(W, "test.dat")        print E        E_total += E    print E_total*1.0/Rdef main():    x,y = read_input_data("train.dat")    # print x.shape, y.shape    # Q11(x, y)    # Q12(x, y)    # Q13(x, y)    Q14(x, y)if __name__ == '__main__':    main()

Q15~Q18是KNN算法相关的，各道题几乎秒出结果，这里不记录答案了：

KNN的核心，也就是KNN函数了：

1）给定K个邻居数，返回这个点属于哪一类，代码尽量写的可配置一些

2）numpy有个argsort函数，可以根据数组的value大小，对下标index进行排序；并返回排序后的index；利用好这个特性，代码很简洁

3）如果是其他的语言，应该实现一个类似numpy.argsort的模块，代码整体上清晰不少能

KNN的代码如下：

#encoding=utf8import sysimport numpy as npimport mathfrom random import *### read data from local file# return with numpy arraydef read_input_data(path):    x = []    y = []    for line in open(path).readlines():        if line.strip()=='': continue        items = line.strip().split(' ')        tmp_x = []        for i in range(0,len(items)-1): tmp_x.append(float(items[i]))        x.append(tmp_x)        y.append(float(items[-1]))    return np.array(x),np.array(y)## # KNN ( for binary classification )# input all labeled data & test sample# return with labeldef KNN(k, x, y, test_x):    distance = np.sum((x-test_x)*(x-test_x), axis=1)    order = np.argsort(distance)    ret = 0    for i in range(k):        ret += y[order[i]]    return 1 if ret>0 else -1### Q15 calculate Eindef calculate_Ein(x, y):    error_count = 0    k = 5    for i in range(x.shape[0]-1):        # tmp_x = np.vstack( ( x[0:i],x[(i+1):(x.shape[0]-1)] ) )        # tmp_y = np.hstack( ( y[0:i],y[(i+1):(x.shape[0]-1)] ) )        ret = KNN( k, x, y, x[i])        if y[i]!=ret:            error_count += 1    return 1.0*error_count/x.shape[0]### Q16 calculate Eoutdef calculate_Eout(x, y, path):    test_x, test_y = read_input_data(path)    error_count = 0    k = 1    for i in range(test_x.shape[0]):        ret = KNN (k, x, y, test_x[i])        if test_y[i]!=ret:            error_count += 1    return 1.0*error_count/test_x.shape[0]def main():    x,y = read_input_data("knn_train.dat")    print calculate_Ein(x,y)    print calculate_Eout(x,y, "knn_test.dat")if __name__ == '__main__':    main()

Q19~Q20是Kmeans算法相关的，运行代码也很快可以得出结果，不记录答案了：

Kmeans的算法实现思路非常清晰：

1）实现初始化随机选各类中心点的功能（题目中是随机选原始数据的点，如果是其他的选点方法，单独拎出来一个模块，不影响其他模块）

2）实现每次更新各个数据点类别的功能（def update_category(x, K, centers)）

3）固定各个点的类别，更新各个类别的center点坐标（def update_centers(x, y, K)）

模块实现上，得益于numpy的矩阵计算操作函数。（应该掌握一套自己的矩阵计算操作代码，这样可以随时拿起来二次开发）

代码如下：

#encoding=utf8import sysimport numpy as npimport mathfrom random import *### read data from local file# return with numpy arraydef read_input_data(path):    x = []    for line in open(path).readlines():        if line.strip()=='': continue        items = line.strip().split(' ')        tmp_x = []        for i in range(0,len(items)): tmp_x.append(float(items[i]))        x.append(tmp_x)    return np.array(x)## # input all data and category K# return K category centersdef Kmeans(x, K):    T = 50     E_total = 0    for t in range(T):        centers = init_centers(x, K)        y = np.zeros(x.shape[0])        R = 50        for r in range(R):            y = update_category(x, K, centers)            centers = update_centers(x, y, K)        E = calculate_Ein(x, y, centers)        print E        E_total += E    return E_total*1.0/Tdef init_centers(x, K):    ret = []    order = range(x.shape[0])    np.random.shuffle(order)    for i in range(K):        ret.append(x[order[i]])    return np.array(ret)def update_category(x, K, centers):    y = []    for i in range(x.shape[0]):        category = -1        distance = float("inf")        for k in range(K):            d = np.sum((x[i] - centers[k])*(x[i] - centers[k]),axis=0)            if d < distance:                distance = d                category = k        y.append(category)    return np.array(y)def update_centers(x, y, K):    centers = []    for k in range(K):        # print "np.sum(x[np.where(y==k)],axis=0)"        # print np.sum(x[np.where(y==k)],axis=0).shape        center = np.sum(x[np.where(y==k)],axis=0)*1.0/np.array(np.where(y==k)).shape[1]        centers.append(center)    return np.array(centers)def calculate_Ein(x, y, centers):    # print centers[0].shape    error_total = 0    for i in range(x.shape[0]):        error_total += np.sum((x[i]-centers[y[i]])*(x[i]-centers[y[i]]),axis=0)    return 1.0*error_total/x.shape[0]def main():    x = read_input_data("kmeans_train.dat")    # print x.shape    print Kmeans(x,2)if __name__ == '__main__':    main()

==========================================================================

完成了这次作业后，终于跟完了《机器学习基石+机器学习技法》32次课，8次coding作业。

个人上完这门课后，主要有三点收获：

1）通过coding的作业题目，实现了一些主流机器学习算法（Perceptron、AdaBoost-stump、Linear Regression、Logistic Regression、Decision Tree、Neural Network、KNN、Kmeans）；以前都是用算法包，对各个算法的理解不如实现过一遍来得深和细。

2）以前对各个算法的理解就是会用（其实也不能说太会用），上完课程后，对每个模型的Motivation有了一定的掌握：模型为什么要这么设计？Regularizer为什么要这么设计？模型的利弊有哪些？以及模型的一些比较直观的数学原理推导。

3）以前看待各个机器学习算法，都是孤立的看待每个算法（这个算法是解决啥的，那个算法是解决啥的），没有成体系地把各个算法拎起来。台大这门课在整个授课环节中，都贯穿了非常强的体系的观念，这里举两个例子：

　　a. Linear Network与Factorization有啥联系（15讲）

　　b. Decision Tree与AdaBoost有啥关系（8、9讲）

　　c. Linear Regression与Neural Network有啥关系（12讲）

在看这门课之前，是绝对不会把上面的每组中两个模型联系起来看待的；但这门课确实给了比较深的motivation，非常强的全局主线。

最后，谈一点个人上公开课的体会：

1）只听一遍：走马观花，学到的东西微乎其微

2）听课，写作业：实践者的态度去学，学到的东西比只听课要多了去了

3）听课，写作业，写听课blog：实践者+研究者的态度去学；“最好的学就是教”，在写blog的过程中，会强迫自己把当时很多不清晰的point都搞清楚，要不然真的写不出来

4）循环进行3）：温故知新的道理大家都懂，就看有没有时间吧

Sign 就写到这了.....

转载于:https://www.cnblogs.com/xbf9xbf/p/4737525.html

你可能感兴趣的文章

图片的编辑（可删除或从相册添加图片），主要是可以在初始化时添加图片