머신러닝,딥러닝/Andrew Ng 머신러닝 코세라 강의 노트

Week 4 Lecture ML : Neural Network

mcdn 2020. 8. 10. 17:11

Non-linear hypothesis

So why do we need yet another learning algorithm? Consider a supervised learning classification problem where you have a training set like this. If you want to apply logistic regression to this problem, one thing you could do is apply logistic regression with a lot of nonlinear features like that.that separates the positive and negative examples. This particular method works well when you have only, say, two features - x1 and x2 - because you can then include all those polynomial terms of x1 and x2. But for many interesting machine learning problems would have a lot more features than just two.

And as we saw we can come up with quite a lot of features, maybe a hundred different features of different houses. 2분 0초부터 동영상을 재생하고 스크립트 따르기2:00 For a problem like this, if you were to include all the quadratic terms, all of these, even all of the quadratic that is the second or the polynomial terms, there would be a lot of them. There would be terms like x1 squared, So including all the quadratic features doesn't seem like it's maybe a good idea, because that is a lot of features and you might up overfitting the training set, and it can also be computationally expensive, you know, to

Many people wonder why computer vision could be difficult. I mean when you and I look at this picture it is so obvious what this is. You wonder how is it that a learning algorithm could possibly fail to know what this picture is.

Concretely, when we use machine learning to build a car detector, what we do is we come up with a label training set, with, let's say, a few label examples of cars and a few label examples of things that are not cars, then we give our training set to the learning algorithm trained a classifier and then, you know, we may test it and show the new image and ask, "What is this new thing?". 6분 17초부터 동영상을 재생하고 스크립트 따르기6:17And hopefully it will recognize that that is a car.

Let's pick a couple of pixel locations in our images, so that's pixel one location and pixel two location, and let's plot this car, you know, at the location, at a certain point, depending on the intensities of pixel one and pixel two. 6분 49초부터 동영상을 재생하고 스크립트 따르기6:49 And let's do this with a few other images. So let's take a different example of the car and you know, look at the same two pixel locations

so the dimension of our feature size will be N equals 2500 where our feature vector x is a list of all the pixel testings, you know, the pixel brightness of pixel one, the brightness of pixel two, and so on down to the pixel brightness of the last pixel where, you know, in a typical computer representation, each of these may be values between say 0 to 255 if it gives us the grayscale value. So we have n equals 2500, and that's if we were using grayscale images. If we were using RGB images with separate red, green and blue values, we would have n equals 7500.

So, if we were to try to learn a nonlinear hypothesis by including all the quadratic features, that is all the terms of the form, you know, Xi times Xj, while with the 2500 pixels we would end up with a total of three million features. And that's just too large to be reasonable; the computation would be very expensive to find and to represent all of these three million features per training example.

Nerural Network

But more recently, Neural Networks have had a major recent resurgence. 1분 13초부터 동영상을 재생하고 스크립트 따르기1:13One of the reasons for this resurgence is that Neural Networks are computationally some what more expensive algorithm and so, it was only, you know, maybe somewhat more recently that computers became fast enough to really run large scale Neural Networks and because of that as well as a few other technical reasons which we'll talk about later, modern Neural Networks today are the state of the art technique for many applications.

This is just a hypothesis but let me share with you some of the evidence for this. This part of the brain, that little red part of the brain, is your auditory cortex and the way you're understanding my voice now is your ear is taking the sound signal and routing the sound signal to your auditory cortex and that's what's allowing you to understand my words.

Neuroscientists have done the following fascinating experiments where you cut the wire from the ears to the auditory cortex and you re-wire, 2분 50초부터 동영상을 재생하고 스크립트 따르기2:50in this case an animal's brain, so that the signal from the eyes to the optic nerve eventually gets routed to the auditory cortex. 2분 58초부터 동영상을 재생하고 스크립트 따르기2:58If you do this it turns out, the auditory cortex will learn 3분 2초부터 동영상을 재생하고 스크립트 따르기3:02to see. And this is in every single sense of the word see as we know it. So, if you do this to the animals, the animals can perform visual discrimination task and as they can look at images and make appropriate decisions based on the images and they're doing it with that piece of brain tissue. Because of this and other similar experiments, these are called neuro-rewiring experiments.

On the upper left is an example of learning to see with your tongue. The way it works is--this is actually a system called BrainPort undergoing FDA trials now to help blind people see--but the way it works is, you strap a grayscale camera to your forehead, facing forward, that takes the low resolution grayscale image of what's in front of you and you then run a wire 4분 51초부터 동영상을 재생하고 스크립트 따르기4:51to an array of electrodes that you place on your tongue so that each pixel gets mapped to a location on your tongue where maybe a high voltage corresponds to a dark pixel and a low voltage corresponds to a bright pixel and, even as it does today, with this sort of system you and I will be able to learn to see, you know, in tens of minutes with our tongues. Here's a second example of human echo location or human sonar. And, some of the bizarre example, but if you plug a third eye into a frog, the frog will learn to use that eye as well.

Nerural Network Model Representation

the neuron has a number of input wires, and these are called the dendrites. You think of them as input wires, and these receive inputs from other locations. And a neuron also has an output wire called an Axon, and this output wire is what it uses to send signals to other neurons, so to send messages to other neurons. So, at a simplistic level what a neuron is, is a computational unit that gets a number of inputs through it input wires and does some computation and then it says outputs via its axon to other nodes or to other neurons in the brain.

So here is one neuron and what it does is if it wants a send a message what it does is sends a little pulse of electricity. Varis axon to some different neuron and here, this axon that is this open wire, connects to the dendrites of this second neuron over here, which then accepts this incoming message that some computation. And they, in turn, decide to send out this message on this axon to other neurons, and this is the process by which all human thought happens.

in an artificial neuron network that we've implemented on the computer, we're going to use a very simple model of what a neuron does we're going to model a neuron as just a logistic unit. So, when I draw a yellow circle like that, you should think of that as a playing a role analysis, who's maybe the body of a neuron, and we then feed the neuron a few inputs who's various dendrites or input wiles. And whenever I draw a diagram like this, what this means is that this represents a computation of h of x equals one over one plus e to the negative theta transpose x, where as usual, x and theta are our parameter vectors, like so.

This x0 now that's sometimes called the bias unit or the bias neuron, but because x0 is already equal to 1, sometimes, I draw this, sometimes I won't just depending on whatever is more notationally convenient for that example.

What a neural network is, is just a group of this different neurons strong together. Completely, here we have input units x1, x2, x3 and once again, sometimes you can draw this extra note x0 and Sometimes not, just flow that in here. And here we have three neurons which have written 81, 82, 83. I'll talk about those indices later.

And then, layer 2 in between, this is called the hidden layer. The term hidden layer isn't a great terminology, but this ideation is that, you know, you supervised early, where you get to see the inputs and get to see the correct outputs, where there's a hidden layer of values you don't get to observe in the training setup. It's not x, and it's not y, and so we call those hidden. And they try to see neural nets with more than one hidden layer but in this example, we have one input layer, Layer 1, one hidden layer, Layer 2, and one output layer, Layer 3. But basically, anything that isn't an input layer and isn't an output layer is called a hidden layer.

I'm going to use a superscript j subscript i to denote the activation of neuron i or of unit i in layer j. So completely this gave superscript to sub group one, that's the activation of the first unit in layer two, in our hidden layer. And by activation I just mean the value that's computed by and as output by a specific. In addition, new network is parametrize by these matrixes, theta super script j Where theta j is going to be a matrix of weights controlling the function mapping form one layer, maybe the first layer to the second layer, or from the second layer to the third layer.

So here are the computations that are represented by this diagram. 8분 34초부터 동영상을 재생하고 스크립트 따르기8:34This first hidden unit here has it's value computed as follows, there's a is a21 is equal to the sigma function of the sigma activation function, also called the logistics activation function, apply to this sort of linear combination of these inputs. And then this second hidden unit has this activation value computer as sigmoid of this. And similarly for this third hidden unit is computed by that formula. So here we have 3 theta 1 which is matrix of parameters governing our mapping from our three different units, our hidden units. Theta 1 is going to be a 3.

To summarize, what we've done is shown how a picture like this over here defines an artificial neural network which defines a function h that maps with x's input values to hopefully to some space that provisions y. And these hypothesis are parameterized by parameters denoting with a capital theta so that, as we vary theta, we get different hypothesis and we get different functions. Mapping say from x to y.

Model Representation I

Let's examine how we will represent a hypothesis function using neural networks. At a very simple level, neurons are basically computational units that take inputs (dendrites) as electrical inputs (called "spikes") that are channeled to outputs (axons). In our model, our dendrites are like the input features x_1\cdots x_n, and the output is the result of our hypothesis function. In this model our x_0 input node is sometimes called the "bias unit." It is always equal to 1. In neural networks, we use the same logistic function as in classification, \frac{1}{1 + e^{-\theta^Tx}}, yet we sometimes call it a sigmoid (logistic) activation function. In this situation, our "theta" parameters are sometimes called "weights".

Visually, a simplistic representation looks like:

⎡⎣x0x1x2⎤⎦

\rightarrow

[ ]

\rightarrow h_\theta(x)

Our input nodes (layer 1), also known as the "input layer", go into another node (layer 2), which finally outputs the hypothesis function, known as the "output layer".

We can have intermediate layers of nodes between the input and output layers called the "hidden layers."

In this example, we label these intermediate or "hidden" layer nodes a^2_0 \cdots a^2_na02⋯an2 and call them "activation units." a(j)i="activation" of unit i in layer jΘ(j)=matrix of weights controlling function mapping from layer j to layer j+1 If we had one hidden layer, it would look like: ⎡⎣⎢⎢x0x1x2x3⎤⎦⎥⎥\rightarrow⎡⎣⎢⎢⎢a(2)1a(2)2a(2)3⎤⎦⎥⎥⎥\rightarrow h_\theta(x)[x0x1x2x3]→[a1(2)a2(2)a3(2)]→hθ(x) The values for each of the "activation" nodes is obtained as follows: a(2)1=g(Θ(1)10x0+Θ(1)11x1+Θ(1)12x2+Θ(1)13x3)a(2)2=g(Θ(1)20x0+Θ(1)21x1+Θ(1)22x2+Θ(1)23x3)a(2)3=g(Θ(1)30x0+Θ(1)31x1+Θ(1)32x2+Θ(1)33x3)hΘ(x)=a(3)1=g(Θ(2)10a(2)0+Θ(2)11a(2)1+Θ(2)12a(2)2+Θ(2)13a(2)3)

This is saying that we compute our activation nodes by using a 3×4 matrix of parameters. We apply each row of the parameters to our inputs to obtain the value for one activation node. Our hypothesis output is the logistic function applied to the sum of the values of our activation nodes, which have been multiplied by yet another parameter matrix \Theta^{(2)} containing the weights for our second layer of nodes.

Each layer gets its own matrix of weights, \Theta^{(j)}.

The dimensions of these matrices of weights is determined as follows:

\text{If network has $s_j$ units in layer $j$ and $s_{j+1}$ units in layer $j+1$, then $\Theta^{(j)}$ will be of dimension $s_{j+1} \times (s_j + 1)$.}

The +1 comes from the addition in \Theta^{(j)} of the "bias nodes," x_0 and \Theta_0^{(j)}. In other words the output nodes will not include the bias nodes while the inputs will. The following image summarizes our model representation:

Example: If layer 1 has 2 input nodes and layer 2 has 4 activation nodes. Dimension of \Theta^{(1)} is going to be 4×3 where s_j = 2 and s_{j+1} = 4, so s_{j+1} \times (s_j + 1) = 4 \times 3.

Nerural Network Model Representation II

you may notice that that block of numbers corresponds suspiciously similar 2분 6초부터 동영상을 재생하고 스크립트 따르기2:06to the matrix vector operation, matrix vector multiplication of x1 times the vector x. Using this observation we're going to be able to vectorize this computation of the neural network. 2분 21초부터 동영상을 재생하고 스크립트 따르기2:21Concretely, let's define the feature vector x as usual to be the vector of x0, x1, x2, x3 where x0 as usual is always equal 1 and that defines z2 to be the vector of these z-values, you know, of z(2)1 z(2)2, z(2)3.

what we're going to do is add an extra a0 superscript 2, that's equal to one, and after taking this step we now have that a2 is going to be a four dimensional feature vector because we just added this extra, you know, a0 which is equal to 1 corresponding to the bias unit in the hidden layer. And finally, 4분 35초부터 동영상을 재생하고 스크립트 따르기4:35to compute the actual value output of our hypotheses, we then simply need to compute 4분 42초부터 동영상을 재생하고 스크립트 따르기4:42z3. So z3 is equal to this term here that I'm just underlining. This inner term there is z3. 4분 53초부터 동영상을 재생하고 스크립트 따르기4:53And z3 is stated 2 times a2 and finally my hypotheses output h of x which is a3 that is the activation of my one and only unit in the output layer. So, that's just the real number. You can write it as a3 or as a(3)1 and that's g of z3. This process of computing h of x is also called forward propagation 5분 19초부터 동영상을 재생하고 스크립트 따르기5:19and is called that because we start of with the activations of the input-units and then we sort of forward-propagate that to the hidden layer and compute the activations of the hidden layer and then we sort of forward propagate that and compute the activations of

let's say I cover up the left path of this picture for now. If you look at what's left in this picture. This looks a lot like logistic regression where what we're doing is we're using that note, that's just the logistic regression unit and we're using that to make a prediction h of x.

Just to say that again, what this neural network is doing is just like logistic regression, except that rather than using the original features x1, x2, x3, 7분 52초부터 동영상을 재생하고 스크립트 따르기7:52is using these new features a1, a2, a3. Again, we'll put the superscripts 7분 58초부터 동영상을 재생하고 스크립트 따르기7:58there, you know, to be consistent with the notation. 8분 2초부터 동영상을 재생하고 스크립트 따르기8:02And the cool thing about this, is that the features a1, a2, a3, they themselves are learned as functions of the input.

This is an example of a different neural network architecture 10분 7초부터 동영상을 재생하고 스크립트 따르기10:07and once again you may be able to get this intuition of how the second layer, here we have three heading units that are computing some complex function maybe of the input layer, and then the third layer can take the second layer's features and compute even more complex features in layer three so that by the time you get to the output layer, layer four, you can have even more complex features of what you are able to compute in layer three and so get very interesting nonlinear hypotheses.

Model Representation II

To re-iterate, the following is an example of a neural network:

In this section we'll do a vectorized implementation of the above functions. We're going to define a new variable z_k^{(j)} that encompasses the parameters inside our g function. In our previous example if we replaced by the variable z for all the parameters we would get:

In other words, for layer j=2 and node k, the variable z will be:

z_k^{(2)} = \Theta_{k,0}^{(1)}x_0 + \Theta_{k,1}^{(1)}x_1 + \cdots + \Theta_{k,n}^{(1)}x_n

The vector representation of x and z^{j} is:

x=⎡⎣⎢⎢x0x1⋯xn⎤⎦⎥⎥z(j)=⎡⎣⎢⎢⎢⎢z(j)1z(j)2⋯z(j)n⎤⎦⎥⎥⎥⎥

Setting x = a^{(1)}, we can rewrite the equation as:

z^{(j)} = \Theta^{(j-1)}a^{(j-1)}

We are multiplying our matrix \Theta^{(j-1)} with dimensions s_j\times (n+1) (where s_j is the number of our activation nodes) by our vector a^{(j-1)} with height (n+1). This gives us our vector z^{(j)} with height s_j. Now we can get a vector of our activation nodes for layer j as follows:

a^{(j)} = g(z^{(j)})

Where our function g can be applied element-wise to our vector z^{(j)}.

We can then add a bias unit (equal to 1) to layer j after we have computed a^{(j)}. This will be element a_0^{(j)} and will be equal to 1. To compute our final hypothesis, let's first compute another z vector:

z^{(j+1)} = \Theta^{(j)}a^{(j)}

We get this final z vector by multiplying the next theta matrix after \Theta^{(j-1)} with the values of all the activation nodes we just got. This last theta matrix \Theta^{(j)} will have only one row which is multiplied by one column a^{(j)} so that our result is a single number. We then get our final result with:

h_\Theta(x) = a^{(j+1)} = g(z^{(j+1)})

Notice that in this last step, between layer j and layer j+1, we are doing exactly the same thing as we did in logistic regression. Adding all these intermediate layers in neural networks allows us to more elegantly produce interesting and more complex non-linear hypotheses.

Nerural Network examples and Intuitions

This means not x1 or x2 and so, we're going to have positive examples of either both are true or both are false and what have as y equals 1, y equals 1. And we're going to have y equals 0 if only one of them is true and we're going to figure out if we can get a neural network to fit to this sort of training set.

And if you look in this column this is exactly the logical and function. So, this is computing h of x is approximately x 1 and x 2. In other words it outputs one If and only if x2, x1 and x2, are both equal to 1. So, by writing out our little truth table like this we manage to figure what's the logical function

So we have constructed one of the fundamental operations in computers by using a small neural network rather than using an actual AND gate. Neural networks can also be used to simulate all the other logical gates. The following is an example of the logical operator 'OR', meaning either x_1 is true or x_2 is true, or both:

You find that's g of minus 10 which is approximately 0. g of 10 which is approximately 1 and so on and these are approximately 1 and approximately 1 and these numbers are essentially the logical OR function. So, hopefully with this you now understand how single neurons in a neural network can be used to compute logical functions like AND and OR and so on.

ddddd

Nerural Network examples and Intuitions II

In the last video we saw how a Neural Network can be used to compute the functions x1 AND x2, and the function x1 OR x2 when x1 and x2 are binary, that is when they take on values 0,1. We can also have a network to compute negation, that is to compute the function not x1. Let me just write down the ways associated with this network.

x1 equals x2 equals 0. All right since this is a logical function, this says NOT x1 means x1 must be 0 and NOT x2, that means x2 must be equal to 0 as well. So this logical function is equal to 1 if and only if both x1 and x2 are equal to 0 and hopefully you should be able to figure out how to make a small neural network to compute this logical function as well.

In the video that I'll show you this area here is the input area that shows a canvasing character shown to the network. This column here shows a visualization of the features computed by sort of the first hidden layer of the network. So that the first hidden layer of the network and so the first hidden layer, this visualization shows different features. Different edges and lines and so on detected. This is a visualization of the next hidden layer. And shown over here is the final answer, it's the final predictive value for what handwritten digit the neural network thinks it is being shown. So let's take a look at the video.

So I hope you enjoyed the video and that this hopefully gave you some intuition about the source of pretty complicated functions neural networks can learn. In which it takes its input this image, just takes this input, the raw pixels and the first hidden layer computes some set of features. The next hidden layer computes even more complex features and even more complex features. And these features can then be used by essentially the final layer of the logistic classifiers to make accurate predictions without the numbers that the network sees.

Nerural Network examples and Intuitions

ddddd

Nerural Network examples and Intuitions

답 : 1 z= theta1 * x, a22 = sigmoid(z) 대박.. theta1 * z처럼 벡터를 곱할 떄 i j , j 1 같이 가운데 숫자가 같아야 함. 그래서 순서는 꼭 theta1 * x 이여야 하고 밑에 x * theta1 은 안되는 것 !

중요!!!!! 4번 중요!!!!!

저작자표시 비영리 변경금지 (새창열림)

'머신러닝,딥러닝 > Andrew Ng 머신러닝 코세라 강의 노트' 카테고리의 다른 글

Week 5 Lecture ML : Neural Net cost funcion (0)	2020.10.24
Week 3 Lecture ML : Classification and Representation (0)	2020.08.07
Week 2 lecture ML : quiz/ submitting lecture assignments (0)	2020.08.07
Week 2 Lecture ML : multiple features (0)	2020.08.06
Week 2 Lecture ML : Setting up Prog Env 'Octave' (0)	2020.08.06
Week 1 Lecture ML : Matrices and Vectors (0)	2020.08.06
Week 1 Lecture ML : Linear Regression ~ parameter learning (0)	2020.08.06
Week 1 Lecture ML:Intro ~ Supervised learning (0)	2020.08.06

현재글Week 4 Lecture ML : Neural Network

tandem 협력 관계

대형컴퓨터학원, 더조은컴퓨터아카데미, 코딩학원후기, 컴퓨터학원후기, inception42, pandas기초강의, numpy기초, nipa온라인교육, 코멘투, 팀단위컴퓨터학원, KG아이티뱅크, 티스토리글상자, 글상자, VirtualBox, 코딩학원가격, 코딩과외, 코리아IT아카데미학원, incpetion42, docker, 문과코딩,

Today :
Yesterday :

코딩일기

Week 4 Lecture ML : Neural Network

Non-linear hypothesis

Nerural Network

Nerural Network Model Representation

Model Representation I

Nerural Network Model Representation II

Model Representation II

Nerural Network examples and Intuitions

Nerural Network examples and Intuitions II

Nerural Network examples and Intuitions

Nerural Network examples and Intuitions

'머신러닝,딥러닝 > Andrew Ng 머신러닝 코세라 강의 노트' 카테고리의 다른 글

'머신러닝,딥러닝/Andrew Ng 머신러닝 코세라 강의 노트'의 다른글

티스토리툴바

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Week 4 Lecture ML : Neural Network

Non-linear hypothesis

Nerural Network

Nerural Network Model Representation

Model Representation I

Nerural Network Model Representation II

Model Representation II

Nerural Network examples and Intuitions

Nerural Network examples and Intuitions II

Nerural Network examples and Intuitions

Nerural Network examples and Intuitions

'머신러닝,딥러닝 > Andrew Ng 머신러닝 코세라 강의 노트' 카테고리의 다른 글

'머신러닝,딥러닝/Andrew Ng 머신러닝 코세라 강의 노트'의 다른글

관련글

티스토리툴바