## Derivation of simple linear regression formula

Finally, I have a time to post another mathematics blog post after absence for almost two years. This time I will discuss formula of simple linear regression. Suppose we have a set of data as follow :

$(x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)$.

We are going to fit those points using a linear equation $Y = \beta_0 + \beta_1 X$. This classical problem is known as a simple linear regression and is usually taught in elementary statistics class around the world. However, due to the rise of computer, students are only given the formula to compute the best estimation of $\beta_0$ and $\beta_1$ without the knowledge on how to derive them. These formula are:

$\beta_1 = \displaystyle{\frac{\sum_{i=1}^n (x_i - \bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}}$, and

$\beta_0 = \displaystyle{\bar{y}-\beta_1 \bar{x}}$

(1)

where $\bar{x}, \bar{y}$ are the average of $x_i$ and $y_i$ respectively. In this post, I am going to derive these formula using four methods. The goal is always the same, this note will serve as a reminder for me when I forget how to derive these formula. If you learn statistics, these are many statistical assumptions such that this model and this formula work, but we are not going to discuss that here.

Calculus

First, let us take a look at the following figure

We see that there are five points that are fitted using a line. To derive $\beta_0$ and $\beta_1$ using a Calculus, we have to define the so-called error function, $E(\beta_0, \beta_1)$. The error function is defined as the sum of square error of each data. The formula is given as follow,

$E(\beta_0,\beta_1)=\displaystyle{\sum_{i=1}^n (y_i - \hat{y}_i)^2=\sum_{i=1}^n (y_i - (\beta_0 + \beta_1 x_i))^2}$,

where $\hat{y}_i$ is the prediction of the $y$ coordinate using the fitted line. If we take a look at the figure above, the error function is the sum of squares of all the red lines. It is obvious that whenever $\beta_0$ and $\beta_1$ change, the error function changes. Therefore we have to find $\beta_0$ and $\beta_1$ such that the error function is minimal. This is a minimization problem and in Calculus we already know about how to minimize function of two variables. First we have to find the first partial derivative of $E$ with respect to $\beta_0$ and $\beta_1$ and equate them to zero, i.e.

$\frac{\partial E}{\partial \beta_0 }=0$ and $\frac{\partial E}{\partial \beta_1 }=0$.

And we have to find $\beta_o$ and $\beta_1$ that satistfy the above equations. Second, we have to check whether or not $\beta_o$ and $\beta_1$ is the minimum or the maximum point or it could be a saddle point. However, our error function $E$ is a convex function, therefore the solution of the first derivative equation always yields a minimum point. Let us find out what is the expression of the first derivative of $E$,

$\frac{\partial E}{\partial \beta_0} = \sum_{i=1}^n (-2)(y_i-(\beta_0+\beta_1 x_i)) =0$ and $\frac{\partial E}{\partial \beta_1} = \sum_{i=1}^n (-2x_i)(y_i-(\beta_0+\beta_1 x_i)) =0$.

Tidying up the above two equations we will get :

$\sum_{i=1}^n y_i = \beta_0 n + \beta_1 \sum_{i=1}^n x_i$, and $\sum_{i=1}^n x_iy_i = \beta_0 \sum_{i=1}^n x_i + \beta_1 \sum_{i=1}^n x_i^2$,

which are a system of linear equations with two variables. We can solve the above linear equations using Cramer’s rule or any other methods, and we should get the following solution :

$\beta_0 = \frac{(\sum_{i=1}^n y_i)(\sum_{i=1}^n x_i^2)-(\sum_{i=1}^n x_i)(\sum_{i=1}^n x_iy_i)}{n(\sum_{i=1}^n x_i^2)-(\sum_{i=1}^n x_i)^2}$ and

$\beta_1 = \frac{n(\sum_{i=1}^n x_iy_i)-(\sum_{i=1}^n x_i)(\sum_{i=1}^n y_i)}{n(\sum_{i=1}^n x_i^2)-(\sum_{i=1}^n x_i)^2}$.

With a little bit of algebra manipulation, the above form is the same as the one in (1). And this is the end of the Calculus part.

Statistics

In this approach, we assume that one is familiar with the concept of mathematical statistics such as random variables, expectations, variances and co-variances. Suppose $X$ and $Y$ are random variables with mean $\mu_X$ and $\mu_Y$ respectively. Random variable $X$ and $Y$ also have variances $\sigma_X^2$ and $\sigma_Y^2$ and they also have co-variance $Cov(X, Y)$.

We want to find $\beta_0$ and $\beta_1$ such that random variable $Y$ and $\beta_0 + \beta_1 X$ is as close as possible. How are going to measure “as close as possible?” Let us define another random variable $Z=(Y-(\beta_0 + \beta_1 X))^2$. We also need to define a function $g$ as a function of $\beta_0, \beta_1$ as follow :

$g(\beta_0, \beta_1) = E(Z) = E[(Y-\beta_0 - \beta_1 X)^2]$.

Therefore, our problem will be similar to our approach in the first section. We need to find $\beta_0$ and $\beta_1$ such that $g$ is minimum or expectation of $Z$ is minimum. Expanding the equation above, we obtain:

$g(\beta_0, \beta_1)=E(Y^2)-2\beta_0E(Y)-2\beta_1E(XY)+ \beta_0^2+2\beta_0\beta_1E(X)+\beta_1^2E(X^2)$.

The above equation is quadratic in $\beta_0$ and $\beta_1$ so the zeros of the first derivatives of the above expression is the minimum point. The first derivatives with respect to $\beta_0$ and $\beta_1$ are :

$\frac{\partial g}{\partial \beta_0} = -2E(Y)+2\beta_0+2\beta_1E(X)=0$,

$\frac{\partial g}{\partial \beta_1} = -2E(XY)+2\beta_0E(X)+2\beta_1E(X^2)=0$,

which we can rewrite as:

$E(Y)=\beta_0+\beta_1E(X)$,

$E(XY)=\beta_0E(X)+\beta_1E(X^2)$,

Multiplying the first equation by $E(X)$ and subtracting it to the second equation, we obtain:

$\beta_1 = \frac{E(XY)-E(X)E(Y)}{E(X^2)-E(X)^2} = \frac{Cov(X,Y)}{\sigma_X^2}$,

and

$\beta_0 = E(Y) - \beta_1 E(X)=\mu_Y - \beta_1 \mu_X$.

The above solution is in the form of random variables. However, what we have is data which is in the form $(x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)$. So, instead of $\mu_X$ we have $\bar{x}$, and instead of $\mu_Y$ we have $\bar{y}$. The same also applies on variances and co-variance, so in the end, the above solution is the same as equation (1). This is the end of the statistics approach.

Linear Algebra

This time, we are going to derive the formula using linear algebra approach. Personally, I think this approach is the most elegant of all other approaches. However, I need to say that one must know a little bit about linear algebra first before reading this, otherwise it would not make sense. So, using a set of data $(x_1,y_1), (x_2,y_2), \ldots, (x_n,y_n)$, we want to find $\beta_0$ and $\beta_1$. Plugging all the data into the linear equation $y = \beta_0 + \beta_1 x$, we will obtain:

$\beta_0 + \beta_1 x_1 = y_1$,
$\beta_0 + \beta_1 x_2 = y_2$,
$\vdots$
$\beta_0 + \beta_1 x_n = y_n$.

The above systems can be rewritten in matrix form, as follow:

$\left( \begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array}\right) \left( \begin{array}{c} \beta_0 \\ \beta_1 \end{array}\right)= \left( \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array}\right)$,

or

$Ax =w$,

where $A$ is a $n \times 2$ matrix, $x = (\beta_0,\beta_1)^T$ and $w$ is a vector containing $y_1, \dots, y_n$. In linear algebra, the equation above is known as the over-determined linear equation. It may have a solution but mostly it does not have one. One method to solve such system is using the theory of projection. Suppose we have a linear equation $Ax=w$. According to the theory of linear algebra, this equation has a solution only if $w$ is inside the column space of $A$. In the above case, where we want to find $\beta_0$ and $\beta_1$, our $w$ is clearly not in the column space of $A$.

Therefore, to find the solution of the linear equation, we must bring $w$ to the column space of matrix $A$. We are going to project $w$ to the column space of $A$. We can do many different projections but according to the theory of linear algebra, we must do an orthogonal projections such that our estimation of $\beta_0$ and $\beta_1$ will bring $Ax$ “closest” to $w$. Let us now discuss the projection theory first.

Consider the following figure.

Suppose we want to project $w$ perpendicularly to the vector space $V$. Suppose $w'$ is the result of the orthogonal projection and $V$ is the column space of $A$. Then the linear equation $Ax=w'$ has a solution since $w'$ is inside the column space of $A$. Consider $V^{\perp}$ which is the orthogonal complement of $V$. Clearly, vector $w-w'$ is inside this space. From the theory of linear algebra, we know that if $V$ is the column space of $A$, $V^{\perp}$ is the null space of $A^T$. Therefore we have $A^T (w-w')=0$, or $A^tw'=A^Tw$. Since we know that $w' = Ax$ then we have $A^TA x = A^T w$. If the matrix $A^TA$ is invertible (mostly it is in our case) we can find the solution of our problem which is $x = (A^TA)^{-1}A^T w$.

That’s it, we have done it. If we substitute back every variable to our original variable, we will have our solution, i.e.

$A = \left( \begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array}\right), x= \left( \begin{array}{c} \beta_0 \\ \beta_1 \end{array}\right), w = \left( \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array}\right)$.

With a little bit of algebra (again), we can easily show that the formula $x=(A^TA)^{-1}A^Tw$ is the same as the equation (1).

Numerical approach

Our last approach is using the theory of numerical methods. Since this approach is numerical, we are not going to get the solution as in the equation (1). This approach will only try to approximate it. This approach will use some of the result obtained in the Calculus approach, which is the error function i.e.,

$E(\beta_0,\beta_1)=\displaystyle{\sum_{i=1}^n (y_i - \hat{y}_i)^2=\sum_{i=1}^n (y_i - (\beta_0 + \beta_1 x_i))^2}$.

However, we are not going to label it as an error function, instead we are going to label it as a cost function. We say this function as a cost function because every time we pick the wrong $\beta_0$ and $\beta_1$ there is a cost that we have to pay. So we have to find the right $\beta_0$ and $\beta_1$ such that the cost is minimum.

There are many numerical methods to solve such a problem, but at this time we are going to discuss only one methods, i.e. gradient descent. The basic idea is we start with arbitrary values of $\beta_0$ and $\beta_1$ and we are going to continually change the value of $\beta_0$ and $\beta_1$ in such a way so that the cost function decreases. Hopefully, we end up at the value where the cost function is minimum.

The following is the general procedure of the gradient descent method :

repeat until convergence {

$\beta_j := \beta_j - \alpha \frac{\partial}{\partial \beta_j} E(\beta_0,\beta_1)$     (for j=0 and j=1)

}

The above procedure will be executed continually until the value of $\beta_0$ and $\beta_1$ convergent, which means that we are getting close to the value where the cost function is minimum. The symbol $\alpha$ is called the learning rate. It always takes a positive value and if this variable is too large, $\beta_j$ will change drastically and we sometimes cannot get the result that we want. But if $\alpha$ is too small, our computation will be very slow, although it is guaranteed that we will get the minimum cost function.

Finally, we need to know some Calculus to derive the partial derivative of the cost function with respect to $\beta_0$ and $\beta_1$. For the readers who know Calculus, they can derive themselves, and we will obtain :

$\frac{\partial }{\partial \beta_0}E=-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i))$,

$\frac{\partial }{\partial \beta_1}E=-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i))x_i$.

Let’s plug this into the gradient descent algorithm and we will have:

repeat until convergence {

$\beta_0 := \beta_0 - \alpha (-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i)))$

$\beta_1 := \beta_1 - \alpha (-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i))x_i)$

}

Let’s take an example. Here is a simple python code to illustrate the idea.

import math
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 2, 2.5, 4, 4.5])
b0,b1 = 0.0, 0.1
val_x = np.linspace(1.0,5.0,100)
val_y = b0+b1*val_x
plt.plot(x,y,'r.')
plt.plot(val_x,val_y,'b-')
plt.grid(True)
plt.show()


Running the above code, we will get the following picture.

The red dots are the data that we are going to fit. The blue line is just one example of $\beta_0$ and $\beta_1$ that of course, is a poor fit. We are going to fix it using gradient descent method.

def costfunction(b0,b1,x,y):
sum = 0
n = len(x)
for i in range(0,n):
sum = sum + (y[i]-(b0+b1*x[i]))**2
return sum
n = len(x)
sum1,sum2 = 0,0
for i in range(0,n):
sum1 = sum1 + (y[i]-(b0+b1*x[i]))
sum2 = sum2 + (y[i]-(b0+b1*x[i]))*x[i]
return [(-2)*sum1, (-2)*sum2]
b0,b1 = 0.0, 0.1
maxiter = 300
alpha = 0.005
c = np.zeros(maxiter)
c[0] = costfunction(b0,b1,x,y)
for i in range(1,maxiter):
temp1 = b0 - alpha*v[0]
temp2 = b1 - alpha*v[1]
b0 = temp1
b1 = temp2
c[i] = costfunction(b0,b1,x,y)
print("$\beta_0$ = ",b0,"$\beta_1$ = ",b1)
val_x = np.linspace(1.0,5.0,100)
val_y = b0+b1*val_x
plt.plot(x,y,'r.')
plt.plot(val_x,val_y,'b-')
plt.grid(True)
plt.show()


Running the second code above, we would get the desired result.  From the figure above, we see that our fit is close enough with our data.

The theoretical value of $\beta_0$ and $\beta_1$ are 0.1 and 0.9 respectively, but in the code above we will obtain only approximation value that was obtained by performing 300 iterations.

## Heidelberg Laurate Forum 2016

I went to the 4th Heidelberg Laureate Forum and it was amazing. I encourage my students to apply and I am sure they will have a wow experience about how mathematics really is. Here is some of my thought about HLF 2016.

1. What an amazing tour, at a wonderful place.
I had an amazing experience, meeting with a lot of laureates, a lot of young and energetic researchers. We discuss a lot of things with one common thing which is mathematics. The meeting place could not have been better and Heidelberg is just perfect.

2. Mathematics meets his little own brother, computer science.
In this event, we did not only talk about math, we also talk about computer science which is so much younger than mathematics.

3.  Frontier between mathematics and computer science is not a fixed boundary, but it is actually an intersection.
Sir Michael Atiyah said this during the workshop. Read many Atiyah quotes here.

4.  Fermat last theorem
We saw the boy who is fascinated by Fermat’s last theorem. He solved it in 1994 and won the Abel’s prize.

5.  Euler proof of infinite prime
I like this proof of infinite prime and he solved it in the eighteenth century.

6. Deep learning perhaps should’t be named deep learning
Noel Sharkey mentioned that machine learning is nothing like learning, instead, it is a statistical parameter optimization.

7. Sometimes when you want to make things tidy, you find a new knowledge. This is what happens in tensor analysis. Vladimir Voevodsky had an idea to help verifying mathematical profs by computers and preserve intimate connection between maths and human intuition. He called this unimath.

8.  Something wrong in computer science education. Leslie Lamport cricitized computer science education as some of computer scientist did not understand the abstraction he made during the talk. He also said “Not every programming languages are as bad as C, but all prevent you from getting to a superior level of abstraction, he proposed Pluscal.”

9. I think Barbara Liskov is the only woman participating as the laureate in this forum. Wondering how many women have won Turing, Abel or Fields medal.

10. The last talk I remember was from Heisuke Hironaka, Fields Medalist in 1970. He gave many advises and one of those are  “Time is expensive, use it well”, ” You learn from your mates more than from your teacher” and” I want to write a book dedicated to my wife ” which was sweet.

I think that is all I can write. I really enjoy my experience in this forum. And now let’s end our journey at Heidelberg castle.

## Taylor method with automatic differentiation – 2

In the previous post, we have talked about Taylor method to solve ordinary differential equation (ODE) numerically. All the derivatives in the Taylor method are computed using automatic differentiation. However, the drawback of this method is when we change our ODE, we have to write another code. In this post, we are going to present a way such that when we want to compute another ODE, we don’t need to change the majority of our code.

Continuing our analysis in the last post, we are going to need operations on polynomials. We are going to define operations such as:

1. Addition between a constant and a polynomial
2. Multiplication between  a constant and a polynomial
4. Multiplication between polynomials
5. Dividing a constant by a polynomial
6. Dividing a polynomial by a polynomial
7. Logarithm of a polynomial
8. Exponentiation of a polynomial
9. Power of a polynomial

Operation 1 and 2 are trivial. Suppose $x(h)$ is a polynomial, then $x(h)=\sum x_i h^i$ for some real constant $x_i$. Suppose $\alpha$ is a real constant, then the addition between $\alpha$ and $x(h)$ is a polynomial $y(h)$ where $y_0 = x_0 + \alpha$ and $y_i = x_i$ for all $i \geq 1$. The multiplication between $\alpha$ and $x(h)$ is a polynomial $z(h)$ where $z_i = \alpha x_i$ for all $i$.

Operation 3 and 4 are not difficult. Suppose $x(h), y(h)$ are polynomials. Then the addition between $x(h)$ and $y(h)$ is a polynomial $z(h)$ where $z_i = x_i + y_i$ for all $i$. The multiplication between $x(h)$ and $y(h)$ is a polynomial $w(h)$ where $w_i = \sum_{j=0}^{i} x_j y_{i-j}$.

Operation 5 and 6 are a little bit tricky. Suppose $x(h)$ is a polynomial and $\alpha$ is a constant. We want to compute $y(h)$ where $y(h) = \alpha / x(h)$. Notice that $x(h)y(h)=\alpha$. Therefore, we are going to take advantage of this equality to find the coefficient of $y(h)$. Thus we shall have $y_0 = \alpha / x_0$, and $y_i = - (\sum_{j=1}^{i} x_j y_{i-j})/x_0$. For operation 6, let’s suppose $z(h) = y(h) / x(h)$. As before, notice that $x(h)z(h) = y(h)$, thus we can compute the coefficient of $z(h)$ using this equality. Thus, $z_0 = y_0 / x_0$ and $z_i = (y_i - \sum_{j=1}^{i} x_j z_{i-j} )/x_0$.

Now, let’s consider the next operation, which is a logarithm of a polynomial. Suppose $x(h)$ is a polynomial and we are going to find $y(h)=\log(x(h))$. Let’s differentiate both side with respect to $h$, then we have $y'(h) = x'(h) / x(h)$. Thus by exploiting a division operation above, we have $y_0 = \log x_0$ and $y_i = (ix_i - (\sum_{j=1}^{i-1} x_j ((i-j)y_{i-j})))/(ix_0)$.

Now, let’s consider the exponential operator, which is an exponentiation of a polynomial. Suppose $x(h)$ is a polynomial and we are going to find $y(h)=\exp(x(h))$. Let’s take logarithm on both side and then differentiate it, we then have $y'(h)/y(h) = x'(h)$, or $y'(h)=y(h)x'(h)$. Therefore, by exploiting a multiplication operation, we shall have $y_0 = \exp x_0$ and $y_i = (\sum_{j=0}^{i-1} y_j ((i-j)x_{i-j}))/i$.

For the last operation of this post, we shall consider the power of a polynomial. Suppose $x(h)$ is a polynomial, we are going to find $y(h)=(x(h))^\alpha$ for any real number $\alpha$. First, we need to take logarithm on both sides and then differentiate the result with respect to $h$, $y'(h)/y(h) = \alpha x'(h)/x(h)$ or $y'(h) = \alpha x'(h)y(h)/x(h)$. By exploiting multiplication and division, we will get the coefficient of $y(h)$.

To facilitate these operation, we build a Python module that will handle these operations. We will also write another code to integrate an ODE. This integration code will work in general without any dependencies on the system or on the order of the system.

def integrate(F,y_init,t_init = 0.0 ,tStop = 1.0 ,h = 0.05 ):
dim = len(y_init)
y_sol = []
for i in range(0,dim):
y_sol.append([])
y_sol[i].append(y_init[i])
t_sol = []
t_sol.append(t_init)
t = t_init

while t &lt; tStop:
y = []
for i in range(0,dim):
y.append([])
y[i].append(y_sol[i][-1])

for k in range(0,20):
dy = F(y)
for i in range(0,dim):
y[i].append(dy[i][k]/float(k+1))
n = len(y[0])
for i in range(0,dim):
temp = 0.0
for j in range(0,n):
temp = temp + y[i][j]*h**j
y_sol[i].append(temp)
t_sol.append(t+h)
t = t + h

return np.array(t_sol) ,np.transpose(np.array(y_sol))


All the operations above and the last code will be written in a python file so that if we need to use this Taylor method, we can just import it. Let’s take an example to integrate one ODE. We will create another code that specifies the ODE and also imports all the operations above and then numerically integrate it.

from mytaylor import *
import matplotlib.pyplot as plt
import numpy as np
import math as m
def fun(y):
z = subtractpoli(y[0],multiplypoli(y[0],y[0]))
return [z]
y_init = [0.5]
t_init = 0
h = 0.1
tStop = 5.0
#integrating ODE
T, Y = integrate(fun,y_init,t_init,tStop,h)
#real solution computation
real_sol = []
for t in T:
temp = m.exp(t)/(m.exp(t)+1.0)
real_sol.append(temp)
#plotting the error versus time
real_sol = np.array(real_sol)
error = Y[:,0]-real_sol
plt.plot(T[:],error[:])
plt.show()


In the above code, we try to numerically integrate $y' = y(1-y)$ with initial condition $y(0)=0.5$. This is the same system we try to integrate in the previous post. I hope you can see the difference between the last code and the code we have shown in the previous post. We really don’t need to consider about Taylor method at all, we just need to write our system in the above code if we want to compute the numerical solution of our code. Notice that we have written all the operations and the integrating code in a file named mytaylor.py which is imported in the first line of the above code. In the fifth line, we can write any ODE we want to integrate, however, the way we write our ODE will be slightly different. Due to polynomial operations that we have defined in this post, $x+y$ cannot be written as just $x+y$ but addpoly($x,y$). Once we define our ODE, we can just integrate it like in line 15.

Another thing we would like to do is to extend the operation of polynomial. We have not included trigonometry operations on polynomials and et cetera. We would like also to test this Taylor method to a stiff system. Please find the following two files (mytaylor.py and test_case.py) for interested readers if you would like to test yourselves.

Categories: python, Teachings

## Taylor method with automatic differentiation

June 14, 2016 1 comment

I teach numerical analysis course and in the one of the chapters, I teach my students about the numerical solution of ordinary differential equations (ODEs). One of numerical solution methods I teach is the so-called Taylor method. This method works by applying the Taylor series to the solution of the differential equation. Suppose we have one-dimensional differential equation as follow:

$\frac{dy}{dt} = f(y)$ and $y(t) = y_0$.

Let’s assume that we only consider an autonomous ODE first.  To find a solution at the next step, $y(t+h)$, we need to apply Taylor expansion, as follow:

$y(t+h) = y(t) + y'(t) h + y''(t)\frac{h^2}{2} +y'''(t)\frac{h^3}{6}+\dots$.

Therefore, depending on how accurate we want our solution is, all we need is to compute the derivatives of $y(t)$. Unfortunately, computing all the derivatives is not an easy task for a computer. We have to derive all the derivatives by ourselves and then input them to the computer.

In this note, I will introduce an automatic differentiation as a tool to compute the derivatives of $y(t+h)$ so that we can compute $y(t+h)$ as accurately as possible. Let us consider the series expansion of $y(t+h)$ as above, but I will use constant $a_i$ to replace the $i-$th derivative of $y(t)$,

$y(t+h) = a_0 + a_1 h + a_2 h^2 + \dots$.

Let us derive the above series with respect to $h$.

$y'(t+h) = a_1 + 2 a_2 h + 3a_3 h^2 + \dots$.

Notice that the derivative at the left hand side of the above equation is actually equal to $f(y(t+h))$ since we know that $y(t+h)$ satisfies the ordinary differential equation. Then we have,

$f(y(t+h)) = a_1 + 2 a_2 h + 3a_3 h^2 + \dots$.

Since $y(t+h)$ is a series of $h$ then $f(y(t+h))$ is also a series of $h$. Then both sides of the above equation are series of $h$, thus we can equate each of the coefficient of $h^j$ to get $a_1$$a_2$ and so on. The most difficult task in this problem is to find the series of $f(y(t+h))$ since the function $f(y)$ can be anything.

Let’s take an example. Suppose we want to integrate the following differential equation,

$\frac{dy}{dt} = y(1-y)$ and $y(0)=0.5$.

Let us choose $h=0.1$ and compute $y(0.1)$. As before, we assume that $y(0.1)$ is a series of $h$ as follows,

$y(0.1) = a_0 + a_1 h + a_2 h^2 + \dots$.

Differentiating both sides of the above equation, we get

$y'(0.1) = a_1 + 2 a_2 h + 3 a_3 h^2 + \dots$.

The right hand side of the above is equal to

$y'(0.1) = y(0.1)(1-y(0.1)) = (a_0 + a_1 h + \dots)(1 - (a_0 + a_1 h + \dots)) = b_0 + b_1 h + b_2 h^2 + \dots$.

Therefore, we get,

$a_1 = b_0$$a_2 = b_1/2$,$a_3 = b_2/3$,  and so on.

The real question is how to find $b_0$, $b_1$ and so on. If the function $f(y)$ only involves operations such as addition, subtraction and multiplication, it would be easy to find the $b$‘s. But if the function $f(y)$ has operations such as division, exponential, logarithm and trigonometric functions then we have to think a bit harder on how to find $b$‘s. We go back to the previous example. Since the operations in this example are only multiplication and subtraction, we can easily compute $b_0$, $b_1$ and so on. Using algebraic modification, we get the following,

$b_0 = a_0 \times (1-a_0)$, thus $a_1 =a_0 \times (1-a_0)$
$b_1 = (a_0 \times -a_1 ) + (a_1 \times (1-a_0))$, thus $a_2 =((a_0 \times -a_1 ) + (a_1 \times (1-a_0)))/2$
$b_2 = (a_0 \times -a_2 ) + (a_1 \times (-a_1))+(a_2 \times (1-a_0))$, thus $a_3 =((a_0 \times -a_2 ) + (a_1 \times (-a_1))+(a_2 \times (1-a_0)))/3$

and so on.

The more $a$‘s we compute, the more accurate our solution is. Using Taylor method this way, we can easily apply this method in a program. It should be clear why this method is called automatic differentiation because we don’t need to manually derive all the derivatives.

Next, I tried to numerically integrate the previous example. Below is the graphic of the error of Taylor method with automatic differentiation versus time. As we can see from the figure, that the error of our numerical integrator is so small and it is almost exact. The number of $a$ I used in this graph is 10.

I use python to numerically integrate this example. You can find the code to produce the above figure in the end of this note. One of the drawbacks of this method is when you change your ODE, you have to change your code as well. In other numerical integrators such as Runge Kutta, we don’t need to change the whole code, we just need to change the ODE and we should get the solution. Together with my student, I am trying to tackle this problem and I hope it will be posted in here.

import math as m
import numpy as np
import pylab as plt

#preliminary constants
order = 10
h = 0.1
t = 0.0
t_end = 5
y0 = 0.5
#defining solution
sol = []
sol.append(y0)
time = []
time.append(t)
#start of the program
while t<t_end:
a = []
a.append(sol[-1])
for k in range(1,order):
if k == 1:
a.append(a[k-1]*(1-a[k-1]))
else:
sum = 0.0
for j in range(0,k-1):
sum = sum + a[j]*(-a[k-1-j])
a.append((sum+a[k-1]*(1.0-a[0]))/k)
sum = 0.0
for i in range(0,order):
sum = sum + a[i]*h**i
sol.append(sum)
t = t + h
time.append(t)
#end of the program

#real solution computation
real_sol = []
for t in time:
temp = m.exp(t)/(m.exp(t)+1.0)
real_sol.append(temp)

#plotting the error versus time
sol = np.array(sol)
real_sol = np.array(real_sol)
time = np.array(time)
error = sol-real_sol
plt.plot(time[:],error[:])
plt.show()

Categories: Justincase, python

## How to fold a paper into thirds, a real analysis perspective

Folding a paper into third is useful when you want to mail someone using an envelope. I search around the internet and there are so many ways to do this. So far, I didn’t see the method I am about to explain. I found this method by accident while I am teaching Real Analysis this semester. I am going to prove that this method actually works. Other methods that I found always use Geometry in the proof but I am going to use Real Analysis instead.

The procedure

Please have a look at the following figure.

The above figure is the first step of our method. Take a sheet of paper and let’s call the left side, side A and the right side, side B. First, you need to fold the paper side A as in the second image of the figure above. It doesn’t have to divide the paper equally, you can just fold it anywhere. Once you do that, you undo the fold and you will have a fold mark, which we call it $x_0$. Next, you need to fold again, but now you need to fold the right side (side B) along the fold mark $x_0$ as in the second row of the above figure. Now you will have two fold marks, the latter is called $y_0$. Now take a look at the following figure.

Now that we have two fold marks ($x_0$ and $y_0$), we need to fold side A along the fold mark $y_0$, as in the first row of the above figure. Undo the fold, you will have the third fold mark (we call it $x_1$). Next, we fold the side B until it touch the fold mark $x_1$ and we will have four fold marks, $x_0, x_1$ and $y_0, y_1$. Continue the step over and over again and at one stage, at the n-step, $x_n$ will be close to one third of the paper and $y_n$ will be very close to two third of the paper.

Simulation

Let’s take an example. Suppose in the beginning, let’s fold side A such that we divide the paper equally. Assume the length of a paper is 1. This means $x_0=0.5$. If we simulate this case, we will get the following table.

 n 1 2 3 4 5 $x_n$ 0.5 0.375 0.3475 0.3359375 0.333984375 $y_n$ 0.75 0.6875 0.671875 0.66796875 0.666992188 6 7 8 0.333496094 0.333374023 0.333343506 0.666748047 0.666687012 0.666671753

As we see, that it took approximately 4-8 steps until $x_n$ is closed to one third of a paper or $y_n$ is closed enough to two third of the paper. However, what happens if we choose different initial condition. We simulate this using a number of initial condition. Suppose we use an A4 paper which has length of 29.7 cm. We choose a number of different initial conditions. We can see the results in the following figures. The x-axis from the above figure is the initial condition and the y-axis is the number of iteration needed until it is closed enough to one third of paper. Since we use an A4 paper, the x-axis ranges from 0 to 29.7 cm.  The tolerance we use in the above figure is 1 mm. In conclusion, we only need to do only maximum four steps until the last fold mark is only 0.1 cm away from one third of the A4 paper.

We relax the tolerance in the above figure. We use 0.5 cm as the tolerance and the number of iterations is slightly improved. It can be seen that if our first trial is close enough to one third of A4 paper, we only need to do one step.

Mathematical proof

Finally, it comes to prove that this methods should work. mathematically not numerically. First of all, we need to mathematically model the above procedures. Let’s assume that the length of the paper we use is 1. Let’s use the same notation as used in the above procedures. Given $x_0$ we will get $y_0 = (1+x_0)/2$. Recursively, we could compute $x_n$ and $y_n$ as follow:

$x_n = y_{n-1}/2$ and $y_n=(1+x_n)/2$, as n=1,2, … .

Even though there are two sequences $(x_n)$ and $(y_n)$, we only need to consider $(x_n)$. We could simplify the sequence $(x_n)$ as follow.

$x_0$ is given and $x_n=y_{n-1}/2=((1+x_{n-1})/2)/2=(1+x_{n-1})/4$ for $n=1,2,\dots$.

Lemma 1
When $0 (as $1/3), the sequence $(x_n)$ is bounded above (bounded below).

Lemma 2
When $0 (as $1/3), the sequence $(x_n)$ is monotonically increasing (decreasing).

Theorem 3
Given $0, the sequence $(x_n)$ converges to 1/3.

Corollary 4
Given $0, the sequence $(y_n)$ converges to 2/3.

The proofs of the above results are given to my students as exercises.

This is the end of this note and the following is the code to produce the above figures.

import math
import numpy as np
import pylab as plt

def f(x):
return (29.7+x)/2.0

def g(y):
return y/2.0

def number_of_iter(x,tol,max_iter):
iter = 0
er = abs(x-29.7/3)
if er <= tol: return 1
while er > tol and iter < max_iter:
iter = iter + 1
y = f(x)
x = g(y)
er = min(abs(29.7/3-x),abs(2*29.7/3-y))

return iter

#main code
x = np.arange(0,29.7,0.1)
y = []
tol = 0.5
max_iter = 100

for t in x:
result = number_of_iter(t,tol,max_iter)
y.append(result)

y = np.array(y)
ax = plt.gca()
ax.set_ylim([0,5])
plt.plot(x,y)
plt.show()

Categories: python, Teachings

## My first try creating a bash script

I am a ubuntu user, but sometimes I also use windows, mainly because my computer in the office is using windows. So, I use both ubuntu and windows. Even, my laptop has both windows and ubuntu. I tried to get rid of windows once, and installed wine to use windows application in my ubuntu. But it runs very slowly and it’s killing me and in the end, I came back to reinstall windows. I mainly use ubuntu for my research. The softwares I use for my research are auto, python, dstool, latex, xfig which run smoothly in ubuntu, even though python and latex can also be installed and run smoothly on windows machine. On the other hand, I use windows to do some regular activities such as browsing the internet, watching movies, checking my email, creating an office documents etc. All of which can be done in ubuntu as well. But there are activities that I must use windows, I sometimes need to use matlab and sometimes I like to play a game that only runs in windows. These two things are the main reason I still come back to use windows. Recently, I learn python so I am now trying to less use matlab.

That is just a background and my main point here is about bash scripting. After a few years using ubuntu, I have not created any bash script. Today, finally I learn to create one script. I created a script to automate my boring routine. When I write a paper, I need some illustrations. I mostly use xfig to create some mathematical images and to be able to use $\LaTeX$ in the figure I need to convert it to an eps file. The produced eps file will be then converted to a pdf file as it is perfectly compatible to my pdflatex command. But before that, I need to crop the resulted pdf file in order to remove white space around the image. Suppose the name of my xfig file is spam.fig. I then write a series of command.

figtex2eps spam.fig
ps2pdf spam.eps
pdfcrop spam.pdf


I want to write a script that do all the above automatically. Thus I created the following script.

#!/bin/bash
# Converts a .fig-file (xfig file) to a .eps-file by using a built-in function figtex2eps
# and then convert it to a .pdf file by using a built-in function ps2pdf
# and finally convert it to a cropped pdf file by using a built-in function pdfcrop
#
# ivanky saputra https://ivanky.wordpress.com
#
# credit to :
# $/home/abel/c/berland/etc/cvsrepository/figtex2eps/prog/figtex2eps$
# $ps2pdf in ubuntu$
# $pdfcrop in ubuntu$
echo "We are going to change the following $1.fig to a cropped pdf file" function quit { exit } if [$1 == ""];
then
echo "no files given";
quit
else
echo "Processing $1.fig............"; figtex2eps$1.fig
ps2pdf $1.eps pdfcrop$1.pdf
echo "Done";
fi


As someone has said that it is better to share your code and data openly as we as human are idiots and will make mistakes, please give me any suggestion to improve mine.

## How to compute distance between a point and a line

This semester (Even Semester, 2015-2016) I am teaching a new course, that is called Analytic Geometry or Geometry Analytic, both are the same, I think. Don’t get me wrong, the course is not new, it is just I would be teaching this course the first time. When they told me that I was going to be the lecturer of this course, they did not give me a standard, they did not give me a textbook they normally use, or a list of topics that I must cover, perhaps because this course is not a compulsory course. So, I have a freedom, I can choose topics that I want to teach in Analytic Geometry. The first thing I did, was to browse the internet any textbook about Analytic Geometry or any lecture note. In the end, I pick a textbook (Indonesian book) and I choose some topics of my interest to be the material of this course.

Okay, enough for the background, I am sure you don’t want to hear anymore on that. Let’s get back to main point of this note. Long story short, a straight line became one of the topics in my course, and I was interested on how the formula to compute distance between a point and a line is derived.In this case, I only consider two dimensional case and by distance, I mean the shortest distance between such a point and a certain line, which also means the perpendicular distance. If you don’t remember what is the formula, let me recall you.

Consider a straight line $l$, given by the following equation:

$l : ax + by + c=0$.

Suppose we have a point $P(x_p,y_p)$, then the distance between the point $P$ and the straight line $l$ is

$d = \frac{|ax_p+by_p+c|}{\sqrt{a^2+b^2}}$.

There are other proofs on how to derive the above formula, but I really really like the proof I am going to show you below.  I will divide this note into two section. The first section will talk about a straight line and how to get an equation of a straight line. The latter section will derive the formula.

1. A straight line equation

In high school, I think you must know what conditions we need to have in order to obtain a straight line equation. If we have two points in $XY-$coordinate we can compute the line equation using the following:

$\frac{y-y_1}{y_2-y_1}=\frac{x-x_1}{x_2-x_1}$.

If we have a gradient and a point, we can also compute the line equation using the following:

$(y-y_1)=m(x-x_1)$.

Of course, in a problem, we won’t get this information so easily. We have work a bit more to get either two points or one point and a gradient and then we are able to obtain the equation of the straight line.

Before I teach this course, I can only conclude that if you want to know the equation of a straight line you need to know either:
(i) two points, or
(ii) a point and gradient of the line.
But, actually there is a third condition, and if we know this condition, we can also compute the equation of a straight line. In the third condition, a line is assumed to be the tangent line of a certain circle. Thus, the characteristics we need to know to form a line are the radius of the circle and the angle between the radius and the positif $x-$axis. See the figure below.

In the figure, we can see that a line can be determined uniquely if the radius of the circle $n$ and the angle $\alpha$ are known. In the first section of this note, we shall derive how to define a line equation given these two conditions.
Suppose we have a straight line, $n$ are $\alpha$ are given. See the figure below. Consider the point $P(x,y)$. We are going to find the relationship of $x$ and $y$

The radius $n$, which is $|ON|$,  can be computed by adding $|OM|$ and $|MN|$. We shall consider $|OM|$ first. Consider triangle $OMQ$ which is a right triangle at $M$. We have the relationship:

$|OM|=|OQ|\cos \alpha=x \cos \alpha$.

Consider another triangle, $PM'Q$, which is also a right triangle at $M'$. We have the following relationship

$|PM'|=|PQ|\sin \alpha=y \sin \alpha$ as it can be computed that the angle $\angle PQM$ is also $\alpha$.

Therefore, we have $n=x \cos \alpha + y \sin \alpha$, which is the equation of straight line given that $n$ and $\alpha$ is known.

Before we discuss how to derive the formula to compute the distance between a point and a line, I would like to discuss how to find $n$ and $\alpha$ if we know the general equation of a straight line $ax+by+c=0$. Consider a line equation$ax+by+c=0$, then we move $c$ to the right hand side and multiple both sides with a non zero constant $k$$kax+kby=-kc$. We shall choose $k$ such that $ka=\cos \alpha$ and $kb=\sin \alpha$. Therefore, we have

$k = \pm \frac{1}{\sqrt{a^2+b^2}}$ and

our equation of line becomes:

$\pm(\frac{a}{\sqrt{a^2+b^2}}x+\frac{b}{\sqrt{a^2+b^2}}y)=\pm(-\frac{c}{\sqrt{a^2+b^2}})$.

Here, we have that the right hand side of the above equation is $n=\pm(-\frac{c}{\sqrt{a^2+b^2}})$, and we choose the positive value to get $n$. The angle $\alpha$ can also be computed once we determine $n$.

2. Distance between a point and a line

To compute a perpendicular distance between a point and a line, we shall use the above result. Consider a straight line $l:x \cos \alpha+y \sin \alpha = n_1$ and a point $P$ in the following figure.

We want to compute $d$. In order to do that, we need to make another line that is parallel to the line $l$ and passing through point $P$. This line, because it is parallel to $l$ and, has the following equation:

$x \cos \alpha+y \sin \alpha = n_2$

Thus the distance between the point $P$ and the line $l$ can be easily computed by considering the absolute difference between $n_1$ and $n_2$ as follow:

$d = |n_2-n_1|=|x_p \cos \alpha + y_p \sin \alpha - n_1|$

As we know from the first section that we can substitute the latter expression into:

$d = |x_p \frac{a}{\sqrt{a^2+b^2}}+ y_p\frac{b}{\sqrt{a^2+b^2}}-(-\frac{c}{\sqrt{a^2+b^2}} )|$

or

$d=\frac{|ax_p+by_p+c|}{\sqrt{a^2+b^2}}$

which is the same as the formula we mentioned in the beginning of our note.

Categories: Teachings Tags: , ,