반응형
Notice
Recent Posts
Recent Comments
Link
«   2025/05   »
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Archives
Today
Total
관리 메뉴

테크매니아

Regression Analysis 본문

카테고리 없음

Regression Analysis

SciomageLAB 2024. 10. 2. 21:01
반응형

What is Regression Analysis

Statistical process of finding relationship between input and output. The relationship can be described as mathematical form like below.

\text{if 'X' is Input data, 'Y' is output, and $\theta$ is unknown variable represented as vector, mathematical relationship 'H' is} \\ Y \approx H(X, \theta)Above equation means we want to find Hypothesis that maps input data X to almost similar value of Y so that we can predict another input. This mathematical equation may contain unknown value and can be represented as linear combination or not.

It means, we need data at least more than length of vector list theta(Unknown variable vector).

let N is number of data and let k is length of vector theta.
The degree of freedom is n-k. In case of linear regression k is 2 so degree of freedom is n-2

This Idea came from 'Finding a tendency of data'. As a example of application we can use this method to predict final term score based on 'continuous study time' data.

Why it was named 'regression'?

In 19th Francis Galton, the scholar of England use this method to find relationship between parent's average height and the child's height. On this research, heights of descendants of tall ancestors tend to regress down towards a normal average. So the term 'regression' means biological regression but the method extended to general statistical context.

### Underlying assumptions

According to wikipedia, below assumptions based on regression.

  • The independent variables are measured with no error (Note: If this is not so, modeling may be done instead using errors-in-variables model techniques).
  • The independent variables (predictors) are linearly independent, i.e. it is not possible to express any predictor as a linear combination of the others
    • Ex. Weight in pound and weight in gram information are same, redundant attribute.
  • The errors are uncorrelated with dependent variable.
    • Ex. Suppose you regress the relationship between money and wealth used for luxury goods every week. More wealth is likely to spend more money, but rich people are rich because they tend to be frugal.
  • The sample is representative of the population for the inference prediction.
  • The error is a random variable with a mean of zero conditional on the independent variables.
  • The variance of the error is all same to every independent variables.(homoscedasticity). If not, weighted least squares or other methods might instead be used.

'Linear' and 'None Linear'

Regression analysis is divided into several types according to linearity and number of variables.

In the regression model, linear model means it can be expressed as linear combination of Variable X like below shape.

Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... +\beta_pX_p \quad \text{($\beta$ is constant)}Even if it's not an n_th order polynomial, it's still Linearizable if you can change the form of the expression through substitution. Below example is also Linear model.

From 'Regression Analysis by Example'## Linear Regression

There are lot of regression method exist but in this article, I only handle simple linear regression.

Look below image. We want to know H(x) that represent the data tendency.
Let's draw a data tendency line manually with rule of thumb. Manually drawn line is has a lot difference between real data value. The red line represents errors.

Our goals find H(x) which has lowest total error. We can calculate sum of errors and just call this sum function as 'J'.

  • [Terms]
    • The function 'J' is called 'cost function' (Also called as 'loss function' in mathematics)
    • error can be called as 'residual' in regression.
    • theta vector called regression coefficient

J(\theta_0, \theta_1)=\sum_{i=1}^m{Error}=\sum_{i=1}^m(H(x_i)-y_i)^2 \quad \text{(m is total number of data)} \\ \text{For prevent data size affect to model, let's normalize the function} \\ J(\theta_0, \theta_1)=\frac{1}{m}\sum_{i=1}^m{Error}=\frac{1}{m}\sum_{i=1}^m(H(x_i)-y_i)^2 \quad \text{(m is total number of data)} \\Finding lowest total error means we need to find a minimum value of function J. We can calculate minimum value by differentiate the J with respect to theta_0 and theta_1.

The function J is 2 variable function so i will have a curved plane shape graph like below image. We need to find a most concave spot of graph.

3dFrom google image## Gradient Descent

Let's suppose cost function has only one parameter. Then cost function J will look like below.

Yellow position is initial value of theta1. If we update the theta1 iteratively like below we can reach where gradient become descent because we trying to keep subtracting negative value from previous position.

Furthermore by adjusting alpha value, we can control the stride of descent. If alpha is too small, it will be take long time, if alpha is too large, i will be ascent.

\theta_1 := \theta_1 - \alpha\frac{d}{d\theta_1}{J(\theta_1)}On the same principle, apply gradient descent algorithm to two variate function 'J'

\theta_0 := \theta_0 - \alpha\frac{\sigma}{\sigma\theta_0}{J(\theta_0, \theta_1)} \\ \theta_1 := \theta_1 - \alpha\frac{\sigma}{\sigma\theta_1}{J(\theta_0, \theta_1)}After of iteration, when we find the minimum of function J, we could also find a theta1 and theta2 value then we can find a proper line function H(x) that represent the data tendency.

Remember function J has second order term in expression and we used a differential to get a minimum of J. It will multiply 2 on over and over. So just modify a function J little by multiplying 1/2.

J(\theta_0, \theta_1)=\frac{1}{2m}\sum_{i=1}^m{Error}=\frac{1}{2m}\sum_{i=1}^m(H(x_i)-y_i)^2 \quad \text{(m is total number of data)}## Sum of Square Errors

The below expression is part of cost function J. This called 'Sum of Square Errors'.

SSE = \sum_{i=1}^m(H(x_i)-y_i)^2 The reason we use Sum of squares of error terms instead just subtract GT from estimation value is as follows

  • We can treat sum of errors as positive value so that we could get a cumulative error
  • Quadratic equation is continuous. Easy to differentiate

Pros and Cons

Pros

  • Easy to implement, easy to add feature, and Flexible

Cons

  • hard to express complicate model
  • The algorithm assumes data is normally distributed in real they are not

Reference

https://namu.wiki/w/회귀 분석
https://ko.wikipedia.org/wiki/회귀_분석
Regression Analysis by Example Samprit Chatterjee, Ali S. Hadi
https://en.wikipedia.org/wiki/Regression_analysishttps://www.quora.com/What-are-the-advantages-and-disadvantages-of-linear-regressionhttps://brunch.co.kr/@gimmesilver/18

반응형