Canadian Team Mathematics Contest 2019

March 28, 2019 Leave a comment

On 11 April 2019, our department will hold a Mathematics competition called CTMC (Canadian Team Mathematics Contest). The contest itself is originally conducted by University of Waterloo. They hold this contest every year.

If your school is interested to apply for this competition, please click the following link :

Registration (The registration is closed now)

This contest is a team contest consisting of 6 students from grade 9 up to grade 12. There are three rounds, the first is team question round in which each team write one set of question together. The second round is individual contest, in which every individual has to write a set of question. The total score of each individual will accumulate to the team score that they obtained in the first round. The third round is the most exciting as it is called the relay round. Student will write a question and the answer will be given to another student from the same team to write another question and so on. So, if the first student fails to give the right answer the final answer will not be correct.

UPH together with other universities in Indonesia will host this event.

Please find the following information about this event in the link below :

CEMC University of Waterloo

Past materials can be downloaded below.
Sample materials : SampleCTMCMaterials

2018 past contest :
Individual : 2018CTMCIndividualProblems
Relay : 2018CTMCRelayProblems
Team : 2018CTMCTeamProblems
Answer Key : 2018CTMCAnswerKeys
Solutions : 2018CTMCSoln

2017 past contest :
Individual : 2017CTMCIndividualProblems
Relay : 2017CTMCRelayProblems
Team : 2017CTMCTeamProblems
Answer Key : 2017CTMCAnswerKeys
Solutions : 2017CTMCSoln

Here is the instruction of the contest (in Indonesian). You can find the contest instruction in English in the following link:


Derivation of simple linear regression formula

May 20, 2018 Leave a comment

Finally, I have a time to post another mathematics blog post after absence for almost two years. This time I will discuss formula of simple linear regression. Suppose we have a set of data as follow :

(x_1,y_1),  (x_2,y_2), \ldots, (x_n,y_n).

We are going to fit those points using a linear equation Y = \beta_0 + \beta_1 X. This classical problem is known as a simple linear regression and is usually taught in elementary statistics class around the world. However, due to the rise of computer, students are only given the formula to compute the best estimation of \beta_0 and \beta_1 without the knowledge on how to derive them. These formula are:

\beta_1 = \displaystyle{\frac{\sum_{i=1}^n (x_i - \bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}} , and

\beta_0 = \displaystyle{\bar{y}-\beta_1 \bar{x}}


where \bar{x}, \bar{y} are the average of x_i and y_i respectively. In this post, I am going to derive these formula using four methods. The goal is always the same, this note will serve as a reminder for me when I forget how to derive these formula. If you learn statistics, these are many statistical assumptions such that this model and this formula work, but we are not going to discuss that here.


First, let us take a look at the following figure

We see that there are five points that are fitted using a line. To derive \beta_0 and \beta_1 using a Calculus, we have to define the so-called error function, E(\beta_0, \beta_1). The error function is defined as the sum of square error of each data. The formula is given as follow,

E(\beta_0,\beta_1)=\displaystyle{\sum_{i=1}^n (y_i - \hat{y}_i)^2=\sum_{i=1}^n (y_i - (\beta_0 + \beta_1 x_i))^2},

where \hat{y}_i is the prediction of the y coordinate using the fitted line. If we take a look at the figure above, the error function is the sum of squares of all the red lines. It is obvious that whenever \beta_0 and \beta_1 change, the error function changes. Therefore we have to find \beta_0 and \beta_1 such that the error function is minimal. This is a minimization problem and in Calculus we already know about how to minimize function of two variables. First we have to find the first partial derivative of E with respect to \beta_0 and \beta_1 and equate them to zero, i.e.

\frac{\partial E}{\partial \beta_0 }=0 and \frac{\partial E}{\partial \beta_1 }=0.

And we have to find \beta_o and \beta_1 that satistfy the above equations. Second, we have to check whether or not \beta_o and \beta_1 is the minimum or the maximum point or it could be a saddle point. However, our error function E is a convex function, therefore the solution of the first derivative equation always yields a minimum point. Let us find out what is the expression of the first derivative of E,

\frac{\partial E}{\partial \beta_0} = \sum_{i=1}^n (-2)(y_i-(\beta_0+\beta_1 x_i)) =0 and \frac{\partial E}{\partial \beta_1} = \sum_{i=1}^n (-2x_i)(y_i-(\beta_0+\beta_1 x_i)) =0.

Tidying up the above two equations we will get :

\sum_{i=1}^n y_i = \beta_0 n + \beta_1 \sum_{i=1}^n x_i, and \sum_{i=1}^n x_iy_i = \beta_0 \sum_{i=1}^n x_i + \beta_1 \sum_{i=1}^n x_i^2,

which are a system of linear equations with two variables. We can solve the above linear equations using Cramer’s rule or any other methods, and we should get the following solution :

\beta_0 = \frac{(\sum_{i=1}^n y_i)(\sum_{i=1}^n x_i^2)-(\sum_{i=1}^n x_i)(\sum_{i=1}^n x_iy_i)}{n(\sum_{i=1}^n x_i^2)-(\sum_{i=1}^n x_i)^2} and

\beta_1 = \frac{n(\sum_{i=1}^n x_iy_i)-(\sum_{i=1}^n x_i)(\sum_{i=1}^n y_i)}{n(\sum_{i=1}^n x_i^2)-(\sum_{i=1}^n x_i)^2}.

With a little bit of algebra manipulation, the above form is the same as the one in (1). And this is the end of the Calculus part.


In this approach, we assume that one is familiar with the concept of mathematical statistics such as random variables, expectations, variances and co-variances. Suppose X and Y are random variables with mean \mu_X and \mu_Y respectively. Random variable X and Y also have variances \sigma_X^2 and \sigma_Y^2 and they also have co-variance Cov(X, Y).

We want to find \beta_0 and \beta_1 such that random variable Y and \beta_0 + \beta_1 X is as close as possible. How are going to measure “as close as possible?” Let us define another random variable Z=(Y-(\beta_0 + \beta_1 X))^2. We also need to define a function g as a function of \beta_0, \beta_1 as follow :

g(\beta_0, \beta_1) = E(Z) = E[(Y-\beta_0 - \beta_1 X)^2].

Therefore, our problem will be similar to our approach in the first section. We need to find \beta_0 and \beta_1 such that g is minimum or expectation of Z is minimum. Expanding the equation above, we obtain:

g(\beta_0, \beta_1)=E(Y^2)-2\beta_0E(Y)-2\beta_1E(XY)+ \beta_0^2+2\beta_0\beta_1E(X)+\beta_1^2E(X^2).

The above equation is quadratic in \beta_0 and \beta_1 so the zeros of the first derivatives of the above expression is the minimum point. The first derivatives with respect to \beta_0 and \beta_1 are :

\frac{\partial g}{\partial \beta_0} = -2E(Y)+2\beta_0+2\beta_1E(X)=0,

\frac{\partial g}{\partial \beta_1} = -2E(XY)+2\beta_0E(X)+2\beta_1E(X^2)=0,

which we can rewrite as:



Multiplying the first equation by E(X) and subtracting it to the second equation, we obtain:

\beta_1 = \frac{E(XY)-E(X)E(Y)}{E(X^2)-E(X)^2} = \frac{Cov(X,Y)}{\sigma_X^2},


\beta_0 = E(Y) - \beta_1 E(X)=\mu_Y - \beta_1 \mu_X.

The above solution is in the form of random variables. However, what we have is data which is in the form (x_1,y_1),  (x_2,y_2), \ldots, (x_n,y_n). So, instead of \mu_X we have \bar{x}, and instead of \mu_Y we have \bar{y}. The same also applies on variances and co-variance, so in the end, the above solution is the same as equation (1). This is the end of the statistics approach.

Linear Algebra

This time, we are going to derive the formula using linear algebra approach. Personally, I think this approach is the most elegant of all other approaches. However, I need to say that one must know a little bit about linear algebra first before reading this, otherwise it would not make sense. So, using a set of data (x_1,y_1),  (x_2,y_2), \ldots, (x_n,y_n), we want to find \beta_0 and \beta_1. Plugging all the data into the linear equation y = \beta_0 + \beta_1 x, we will obtain:

\beta_0 + \beta_1 x_1 = y_1,
\beta_0 + \beta_1 x_2 = y_2,
\beta_0 + \beta_1 x_n = y_n.

The above systems can be rewritten in matrix form, as follow:

\left( \begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array}\right) \left( \begin{array}{c} \beta_0 \\ \beta_1 \end{array}\right)= \left( \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array}\right),


Ax =w,

where A is a n \times 2 matrix, x = (\beta_0,\beta_1)^T and w is a vector containing y_1, \dots, y_n. In linear algebra, the equation above is known as the over-determined linear equation. It may have a solution but mostly it does not have one. One method to solve such system is using the theory of projection. Suppose we have a linear equation Ax=w. According to the theory of linear algebra, this equation has a solution only if w is inside the column space of A. In the above case, where we want to find \beta_0 and \beta_1, our w is clearly not in the column space of A.

Therefore, to find the solution of the linear equation, we must bring w to the column space of matrix A. We are going to project w to the column space of A. We can do many different projections but according to the theory of linear algebra, we must do an orthogonal projections such that our estimation of \beta_0 and \beta_1 will bring Ax “closest” to w. Let us now discuss the projection theory first.

Consider the following figure.

Suppose we want to project w perpendicularly to the vector space V. Suppose w' is the result of the orthogonal projection and V is the column space of A. Then the linear equation Ax=w' has a solution since w' is inside the column space of A. Consider V^{\perp} which is the orthogonal complement of V. Clearly, vector w-w' is inside this space. From the theory of linear algebra, we know that if V is the column space of A, V^{\perp} is the null space of A^T. Therefore we have A^T (w-w')=0, or A^tw'=A^Tw. Since we know that w' = Ax then we have A^TA x = A^T w. If the matrix A^TA is invertible (mostly it is in our case) we can find the solution of our problem which is x = (A^TA)^{-1}A^T w.

That’s it, we have done it. If we substitute back every variable to our original variable, we will have our solution, i.e.

A = \left( \begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array}\right), x= \left( \begin{array}{c} \beta_0 \\ \beta_1 \end{array}\right), w = \left( \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array}\right).

With a little bit of algebra (again), we can easily show that the formula x=(A^TA)^{-1}A^Tw is the same as the equation (1).

Numerical approach

Our last approach is using the theory of numerical methods. Since this approach is numerical, we are not going to get the solution as in the equation (1). This approach will only try to approximate it. This approach will use some of the result obtained in the Calculus approach, which is the error function i.e.,

E(\beta_0,\beta_1)=\displaystyle{\sum_{i=1}^n (y_i - \hat{y}_i)^2=\sum_{i=1}^n (y_i - (\beta_0 + \beta_1 x_i))^2}.

However, we are not going to label it as an error function, instead we are going to label it as a cost function. We say this function as a cost function because every time we pick the wrong \beta_0 and \beta_1 there is a cost that we have to pay. So we have to find the right \beta_0 and \beta_1 such that the cost is minimum.

There are many numerical methods to solve such a problem, but at this time we are going to discuss only one methods, i.e. gradient descent. The basic idea is we start with arbitrary values of \beta_0 and \beta_1 and we are going to continually change the value of \beta_0 and \beta_1 in such a way so that the cost function decreases. Hopefully, we end up at the value where the cost function is minimum.

The following is the general procedure of the gradient descent method :

repeat until convergence {

\beta_j := \beta_j - \alpha \frac{\partial}{\partial \beta_j} E(\beta_0,\beta_1)     (for j=0 and j=1)


The above procedure will be executed continually until the value of \beta_0 and \beta_1 convergent, which means that we are getting close to the value where the cost function is minimum. The symbol \alpha is called the learning rate. It always takes a positive value and if this variable is too large, \beta_j will change drastically and we sometimes cannot get the result that we want. But if \alpha is too small, our computation will be very slow, although it is guaranteed that we will get the minimum cost function.

Finally, we need to know some Calculus to derive the partial derivative of the cost function with respect to \beta_0 and \beta_1. For the readers who know Calculus, they can derive themselves, and we will obtain :

\frac{\partial }{\partial \beta_0}E=-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i)),

\frac{\partial }{\partial \beta_1}E=-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i))x_i.

Let’s plug this into the gradient descent algorithm and we will have:

repeat until convergence {

\beta_0 := \beta_0 - \alpha (-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i)))

\beta_1 := \beta_1 - \alpha (-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i))x_i)


Let’s take an example. Here is a simple python code to illustrate the idea.

import math
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 2, 2.5, 4, 4.5])
b0,b1 = 0.0, 0.1
val_x = np.linspace(1.0,5.0,100)
val_y = b0+b1*val_x

Running the above code, we will get the following picture.

The red dots are the data that we are going to fit. The blue line is just one example of \beta_0 and \beta_1 that of course, is a poor fit. We are going to fix it using gradient descent method.

def costfunction(b0,b1,x,y):
    sum = 0
    n = len(x)
    for i in range(0,n):
        sum = sum + (y[i]-(b0+b1*x[i]))**2
return sum
def grad(b0,b1,x,y):
    n = len(x)
    sum1,sum2 = 0,0
    for i in range(0,n):
        sum1 = sum1 + (y[i]-(b0+b1*x[i]))
        sum2 = sum2 + (y[i]-(b0+b1*x[i]))*x[i]
return [(-2)*sum1, (-2)*sum2]
b0,b1 = 0.0, 0.1
maxiter = 300
alpha = 0.005
c = np.zeros(maxiter)
c[0] = costfunction(b0,b1,x,y)
for i in range(1,maxiter):
    v = grad(b0,b1,x,y)
    temp1 = b0 - alpha*v[0]
    temp2 = b1 - alpha*v[1]
    b0 = temp1
    b1 = temp2
    c[i] = costfunction(b0,b1,x,y)
print("$\beta_0$ = ",b0,"$\beta_1$ = ",b1)
val_x = np.linspace(1.0,5.0,100)
val_y = b0+b1*val_x

Running the second code above, we would get the desired result.  From the figure above, we see that our fit is close enough with our data. 

The theoretical value of \beta_0 and \beta_1 are 0.1 and 0.9 respectively, but in the code above we will obtain only approximation value that was obtained by performing 300 iterations.

Heidelberg Laurate Forum 2016

October 8, 2016 Leave a comment

I went to the 4th Heidelberg Laureate Forum and it was amazing. I encourage my students to apply and I am sure they will have a wow experience about how mathematics really is. Here is some of my thought about HLF 2016.

1. What an amazing tour, at a wonderful place.
I had an amazing experience, meeting with a lot of laureates, a lot of young and energetic researchers. We discuss a lot of things with one common thing which is mathematics. The meeting place could not have been better and Heidelberg is just perfect.

2. Mathematics meets his little own brother, computer science.
In this event, we did not only talk about math, we also talk about computer science which is so much younger than mathematics.

3.  Frontier between mathematics and computer science is not a fixed boundary, but it is actually an intersection.
Sir Michael Atiyah said this during the workshop. Read many Atiyah quotes here.

4.  Fermat last theorem
We saw the boy who is fascinated by Fermat’s last theorem. He solved it in 1994 and won the Abel’s prize.

5.  Euler proof of infinite prime
I like this proof of infinite prime and he solved it in the eighteenth century.

6. Deep learning perhaps should’t be named deep learning
Noel Sharkey mentioned that machine learning is nothing like learning, instead, it is a statistical parameter optimization.

7. Sometimes when you want to make things tidy, you find a new knowledge. This is what happens in tensor analysis. Vladimir Voevodsky had an idea to help verifying mathematical profs by computers and preserve intimate connection between maths and human intuition. He called this unimath.

8.  Something wrong in computer science education. Leslie Lamport cricitized computer science education as some of computer scientist did not understand the abstraction he made during the talk. He also said “Not every programming languages are as bad as C, but all prevent you from getting to a superior level of abstraction, he proposed Pluscal.”

9. I think Barbara Liskov is the only woman participating as the laureate in this forum. Wondering how many women have won Turing, Abel or Fields medal.

10. The last talk I remember was from Heisuke Hironaka, Fields Medalist in 1970. He gave many advises and one of those are  “Time is expensive, use it well”, ” You learn from your mates more than from your teacher” and” I want to write a book dedicated to my wife ” which was sweet.

I think that is all I can write. I really enjoy my experience in this forum. And now let’s end our journey at Heidelberg castle.


Taylor method with automatic differentiation – 2

July 20, 2016 Leave a comment

In the previous post, we have talked about Taylor method to solve ordinary differential equation (ODE) numerically. All the derivatives in the Taylor method are computed using automatic differentiation. However, the drawback of this method is when we change our ODE, we have to write another code. In this post, we are going to present a way such that when we want to compute another ODE, we don’t need to change the majority of our code.

Continuing our analysis in the last post, we are going to need operations on polynomials. We are going to define operations such as:

  1. Addition between a constant and a polynomial
  2. Multiplication between  a constant and a polynomial
  3. Addition between polynomials
  4. Multiplication between polynomials
  5. Dividing a constant by a polynomial
  6. Dividing a polynomial by a polynomial
  7. Logarithm of a polynomial
  8. Exponentiation of a polynomial
  9. Power of a polynomial

Operation 1 and 2 are trivial. Suppose x(h) is a polynomial, then x(h)=\sum x_i h^i for some real constant x_i. Suppose \alpha is a real constant, then the addition between \alpha and x(h) is a polynomial y(h) where y_0 = x_0 + \alpha and y_i = x_i for all i \geq 1. The multiplication between \alpha and x(h) is a polynomial z(h) where z_i = \alpha x_i for all i.

Operation 3 and 4 are not difficult. Suppose x(h), y(h) are polynomials. Then the addition between x(h) and y(h) is a polynomial z(h) where z_i = x_i + y_i for all i. The multiplication between x(h) and y(h) is a polynomial w(h) where w_i = \sum_{j=0}^{i} x_j y_{i-j}.

Operation 5 and 6 are a little bit tricky. Suppose x(h) is a polynomial and \alpha is a constant. We want to compute y(h) where y(h) = \alpha / x(h). Notice that x(h)y(h)=\alpha. Therefore, we are going to take advantage of this equality to find the coefficient of y(h). Thus we shall have y_0 = \alpha / x_0, and y_i = - (\sum_{j=1}^{i} x_j y_{i-j})/x_0. For operation 6, let’s suppose z(h) = y(h) / x(h). As before, notice that x(h)z(h) = y(h), thus we can compute the coefficient of z(h) using this equality. Thus, z_0 = y_0 / x_0 and z_i = (y_i - \sum_{j=1}^{i} x_j z_{i-j} )/x_0.

Now, let’s consider the next operation, which is a logarithm of a polynomial. Suppose x(h) is a polynomial and we are going to find y(h)=\log(x(h)). Let’s differentiate both side with respect to h, then we have y'(h) = x'(h) / x(h). Thus by exploiting a division operation above, we have y_0 = \log x_0 and y_i = (ix_i - (\sum_{j=1}^{i-1} x_j ((i-j)y_{i-j})))/(ix_0).

Now, let’s consider the exponential operator, which is an exponentiation of a polynomial. Suppose x(h) is a polynomial and we are going to find y(h)=\exp(x(h)). Let’s take logarithm on both side and then differentiate it, we then have y'(h)/y(h) = x'(h), or y'(h)=y(h)x'(h). Therefore, by exploiting a multiplication operation, we shall have y_0  = \exp x_0 and y_i = (\sum_{j=0}^{i-1} y_j ((i-j)x_{i-j}))/i.

For the last operation of this post, we shall consider the power of a polynomial. Suppose x(h) is a polynomial, we are going to find y(h)=(x(h))^\alpha for any real number \alpha. First, we need to take logarithm on both sides and then differentiate the result with respect to h, y'(h)/y(h) = \alpha x'(h)/x(h) or y'(h) = \alpha x'(h)y(h)/x(h). By exploiting multiplication and division, we will get the coefficient of y(h).

To facilitate these operation, we build a Python module that will handle these operations. We will also write another code to integrate an ODE. This integration code will work in general without any dependencies on the system or on the order of the system.

def integrate(F,y_init,t_init = 0.0 ,tStop = 1.0 ,h = 0.05 ):
    dim = len(y_init)
    y_sol = []
    for i in range(0,dim):
    t_sol = []
    t = t_init

    while t < tStop:
        y = []
        for i in range(0,dim):

        for k in range(0,20):
            dy = F(y)
            for i in range(0,dim):
        n = len(y[0])
        for i in range(0,dim):
            temp = 0.0
            for j in range(0,n):
                temp = temp + y[i][j]*h**j
        t = t + h

    return np.array(t_sol) ,np.transpose(np.array(y_sol))

All the operations above and the last code will be written in a python file so that if we need to use this Taylor method, we can just import it. Let’s take an example to integrate one ODE. We will create another code that specifies the ODE and also imports all the operations above and then numerically integrate it.

from mytaylor import *
import matplotlib.pyplot as plt
import numpy as np
import math as m
#define your ODE here
def fun(y):
    z = subtractpoli(y[0],multiplypoli(y[0],y[0]))
    return [z]
#define your computation parameter here
y_init = [0.5]
t_init = 0
h = 0.1
tStop = 5.0
#integrating ODE
T, Y = integrate(fun,y_init,t_init,tStop,h)
#real solution computation
real_sol = []
for t in T:
	temp = m.exp(t)/(m.exp(t)+1.0)
#plotting the error versus time
real_sol = np.array(real_sol)
error = Y[:,0]-real_sol

In the above code, we try to numerically integrate y' = y(1-y) with initial condition y(0)=0.5. This is the same system we try to integrate in the previous post. I hope you can see the difference between the last code and the code we have shown in the previous post. We really don’t need to consider about Taylor method at all, we just need to write our system in the above code if we want to compute the numerical solution of our code. Notice that we have written all the operations and the integrating code in a file named which is imported in the first line of the above code. In the fifth line, we can write any ODE we want to integrate, however, the way we write our ODE will be slightly different. Due to polynomial operations that we have defined in this post, x+y cannot be written as just x+y but addpoly(x,y). Once we define our ODE, we can just integrate it like in line 15.

Another thing we would like to do is to extend the operation of polynomial. We have not included trigonometry operations on polynomials and et cetera. We would like also to test this Taylor method to a stiff system. Please find the following two files ( and for interested readers if you would like to test yourselves.

Taylor method with automatic differentiation

June 14, 2016 1 comment

I teach numerical analysis course and in the one of the chapters, I teach my students about the numerical solution of ordinary differential equations (ODEs). One of numerical solution methods I teach is the so-called Taylor method. This method works by applying the Taylor series to the solution of the differential equation. Suppose we have one-dimensional differential equation as follow:

\frac{dy}{dt} = f(y) and y(t) = y_0.

Let’s assume that we only consider an autonomous ODE first.  To find a solution at the next step, y(t+h), we need to apply Taylor expansion, as follow:

y(t+h) = y(t) + y'(t) h + y''(t)\frac{h^2}{2} +y'''(t)\frac{h^3}{6}+\dots.

Therefore, depending on how accurate we want our solution is, all we need is to compute the derivatives of y(t). Unfortunately, computing all the derivatives is not an easy task for a computer. We have to derive all the derivatives by ourselves and then input them to the computer.

In this note, I will introduce an automatic differentiation as a tool to compute the derivatives of y(t+h) so that we can compute y(t+h) as accurately as possible. Let us consider the series expansion of y(t+h) as above, but I will use constant a_i to replace the i-th derivative of y(t),

y(t+h) = a_0 + a_1 h + a_2 h^2 + \dots.

Let us derive the above series with respect to h.

y'(t+h) = a_1 + 2 a_2 h + 3a_3 h^2 + \dots.

Notice that the derivative at the left hand side of the above equation is actually equal to f(y(t+h)) since we know that y(t+h) satisfies the ordinary differential equation. Then we have,

f(y(t+h)) = a_1 + 2 a_2 h + 3a_3 h^2 + \dots.

Since y(t+h) is a series of h then f(y(t+h)) is also a series of h. Then both sides of the above equation are series of h, thus we can equate each of the coefficient of h^j to get a_1a_2 and so on. The most difficult task in this problem is to find the series of f(y(t+h)) since the function f(y) can be anything.

Let’s take an example. Suppose we want to integrate the following differential equation,

\frac{dy}{dt} = y(1-y) and y(0)=0.5.

Let us choose h=0.1 and compute y(0.1). As before, we assume that y(0.1) is a series of h as follows,

y(0.1) = a_0 + a_1 h + a_2 h^2 + \dots.

Differentiating both sides of the above equation, we get

y'(0.1) = a_1 + 2 a_2 h + 3 a_3 h^2 + \dots.

The right hand side of the above is equal to

y'(0.1) = y(0.1)(1-y(0.1)) = (a_0 + a_1 h + \dots)(1  - (a_0 + a_1 h + \dots)) = b_0 + b_1 h + b_2 h^2 + \dots.

Therefore, we get,

a_1 = b_0a_2 = b_1/2,a_3 = b_2/3,  and so on.

The real question is how to find b_0, b_1 and so on. If the function f(y) only involves operations such as addition, subtraction and multiplication, it would be easy to find the b‘s. But if the function f(y) has operations such as division, exponential, logarithm and trigonometric functions then we have to think a bit harder on how to find b‘s. We go back to the previous example. Since the operations in this example are only multiplication and subtraction, we can easily compute b_0, b_1 and so on. Using algebraic modification, we get the following,

b_0 = a_0 \times (1-a_0), thus a_1 =a_0 \times (1-a_0)
b_1 = (a_0 \times -a_1 ) + (a_1 \times (1-a_0)), thus a_2 =((a_0 \times -a_1 ) + (a_1 \times (1-a_0)))/2
b_2 = (a_0 \times -a_2 ) + (a_1 \times (-a_1))+(a_2 \times (1-a_0)), thus a_3 =((a_0 \times -a_2 ) + (a_1 \times (-a_1))+(a_2 \times (1-a_0)))/3

and so on.

The more a‘s we compute, the more accurate our solution is. Using Taylor method this way, we can easily apply this method in a program. It should be clear why this method is called automatic differentiation because we don’t need to manually derive all the derivatives.

Next, I tried to numerically integrate the previous example. Below is the graphic of the error of Taylor method with automatic differentiation versus time. As we can see from the figure, that the error of our numerical integrator is so small and it is almost exact. The number of a I used in this graph is 10. error_plot

I use python to numerically integrate this example. You can find the code to produce the above figure in the end of this note. One of the drawbacks of this method is when you change your ODE, you have to change your code as well. In other numerical integrators such as Runge Kutta, we don’t need to change the whole code, we just need to change the ODE and we should get the solution. Together with my student, I am trying to tackle this problem and I hope it will be posted in here.

import math as m
import numpy as np
import pylab as plt

#preliminary constants
order = 10
h = 0.1
t = 0.0
t_end = 5
y0 = 0.5
#defining solution
sol = []
time = []
#start of the program
while t<t_end:
	a = []
	for k in range(1,order):
		if k == 1:
			sum = 0.0
			for j in range(0,k-1):
				sum = sum + a[j]*(-a[k-1-j])
	sum = 0.0
	for i in range(0,order):
		sum = sum + a[i]*h**i
	t = t + h
#end of the program

#real solution computation
real_sol = []
for t in time:
	temp = m.exp(t)/(m.exp(t)+1.0)

#plotting the error versus time
sol = np.array(sol)
real_sol = np.array(real_sol)
time = np.array(time)
error = sol-real_sol

How to fold a paper into thirds, a real analysis perspective

February 19, 2016 Leave a comment

Folding a paper into third is useful when you want to mail someone using an envelope. I search around the internet and there are so many ways to do this. So far, I didn’t see the method I am about to explain. I found this method by accident while I am teaching Real Analysis this semester. I am going to prove that this method actually works. Other methods that I found always use Geometry in the proof but I am going to use Real Analysis instead.

The procedure

Please have a look at the following figure. firststep

The above figure is the first step of our method. Take a sheet of paper and let’s call the left side, side A and the right side, side B. First, you need to fold the paper side A as in the second image of the figure above. It doesn’t have to divide the paper equally, you can just fold it anywhere. Once you do that, you undo the fold and you will have a fold mark, which we call it x_0. Next, you need to fold again, but now you need to fold the right side (side B) along the fold mark x_0 as in the second row of the above figure. Now you will have two fold marks, the latter is called y_0. Now take a look at the following figure. secondstep

Now that we have two fold marks (x_0 and y_0), we need to fold side A along the fold mark y_0, as in the first row of the above figure. Undo the fold, you will have the third fold mark (we call it x_1). Next, we fold the side B until it touch the fold mark x_1 and we will have four fold marks, x_0, x_1 and y_0, y_1. Continue the step over and over again and at one stage, at the n-step, x_n will be close to one third of the paper and y_n will be very close to two third of the paper.


Let’s take an example. Suppose in the beginning, let’s fold side A such that we divide the paper equally. Assume the length of a paper is 1. This means x_0=0.5. If we simulate this case, we will get the following table.

n 1 2 3 4 5
x_n 0.5 0.375 0.3475 0.3359375 0.333984375
y_n 0.75 0.6875 0.671875 0.66796875 0.666992188
6 7 8
0.333496094 0.333374023 0.333343506
0.666748047 0.666687012 0.666671753

As we see, that it took approximately 4-8 steps until x_n is closed to one third of a paper or y_n is closed enough to two third of the paper. However, what happens if we choose different initial condition. We simulate this using a number of initial condition. Suppose we use an A4 paper which has length of 29.7 cm. We choose a number of different initial conditions. We can see the results in the following figures. simulation01The x-axis from the above figure is the initial condition and the y-axis is the number of iteration needed until it is closed enough to one third of paper. Since we use an A4 paper, the x-axis ranges from 0 to 29.7 cm.  The tolerance we use in the above figure is 1 mm. In conclusion, we only need to do only maximum four steps until the last fold mark is only 0.1 cm away from one third of the A4 paper. simulation05

We relax the tolerance in the above figure. We use 0.5 cm as the tolerance and the number of iterations is slightly improved. It can be seen that if our first trial is close enough to one third of A4 paper, we only need to do one step.

Mathematical proof

Finally, it comes to prove that this methods should work. mathematically not numerically. First of all, we need to mathematically model the above procedures. Let’s assume that the length of the paper we use is 1. Let’s use the same notation as used in the above procedures. Given x_0 we will get y_0 = (1+x_0)/2. Recursively, we could compute x_n and y_n as follow:

x_n = y_{n-1}/2 and y_n=(1+x_n)/2, as n=1,2, … .

Even though there are two sequences (x_n) and (y_n), we only need to consider (x_n). We could simplify the sequence (x_n) as follow.

x_0 is given and x_n=y_{n-1}/2=((1+x_{n-1})/2)/2=(1+x_{n-1})/4 for n=1,2,\dots.

Lemma 1
When 0<x_0<1/3 (as 1/3<x_0<1), the sequence (x_n) is bounded above (bounded below).

Lemma 2
When 0<x_0<1/3 (as 1/3<x_0<1), the sequence (x_n) is monotonically increasing (decreasing).

Theorem 3
Given 0<x_0<1, the sequence (x_n) converges to 1/3.

Corollary 4
Given 0<x_0<1, the sequence (y_n) converges to 2/3.

The proofs of the above results are given to my students as exercises.

This is the end of this note and the following is the code to produce the above figures.

import math
import numpy as np
import pylab as plt

def f(x):
 return (29.7+x)/2.0

def g(y):
 return y/2.0

def number_of_iter(x,tol,max_iter):
 iter = 0
 er = abs(x-29.7/3)
 if er <= tol: return 1 
 while er > tol and iter < max_iter: 
  iter = iter + 1 
  y = f(x)
  x = g(y)
  er = min(abs(29.7/3-x),abs(2*29.7/3-y))

 return iter

#main code
x = np.arange(0,29.7,0.1)
y = []
tol = 0.5
max_iter = 100

for t in x:
 result = number_of_iter(t,tol,max_iter) 
y = np.array(y)
ax = plt.gca()

My first try creating a bash script

January 30, 2016 2 comments

I am a ubuntu user, but sometimes I also use windows, mainly because my computer in the office is using windows. So, I use both ubuntu and windows. Even, my laptop has both windows and ubuntu. I tried to get rid of windows once, and installed wine to use windows application in my ubuntu. But it runs very slowly and it’s killing me and in the end, I came back to reinstall windows. I mainly use ubuntu for my research. The softwares I use for my research are auto, python, dstool, latex, xfig which run smoothly in ubuntu, even though python and latex can also be installed and run smoothly on windows machine. On the other hand, I use windows to do some regular activities such as browsing the internet, watching movies, checking my email, creating an office documents etc. All of which can be done in ubuntu as well. But there are activities that I must use windows, I sometimes need to use matlab and sometimes I like to play a game that only runs in windows. These two things are the main reason I still come back to use windows. Recently, I learn python so I am now trying to less use matlab.

That is just a background and my main point here is about bash scripting. After a few years using ubuntu, I have not created any bash script. Today, finally I learn to create one script. I created a script to automate my boring routine. When I write a paper, I need some illustrations. I mostly use xfig to create some mathematical images and to be able to use \LaTeX in the figure I need to convert it to an eps file. The produced eps file will be then converted to a pdf file as it is perfectly compatible to my pdflatex command. But before that, I need to crop the resulted pdf file in order to remove white space around the image. Suppose the name of my xfig file is spam.fig. I then write a series of command.

figtex2eps spam.fig
ps2pdf spam.eps
pdfcrop spam.pdf

I want to write a script that do all the above automatically. Thus I created the following script.

# Converts a .fig-file (xfig file) to a .eps-file by using a built-in function figtex2eps
# and then convert it to a .pdf file by using a built-in function ps2pdf
# and finally convert it to a cropped pdf file by using a built-in function pdfcrop
# ivanky saputra
# credit to :
# $ /home/abel/c/berland/etc/cvsrepository/figtex2eps/prog/figtex2eps $
# $ ps2pdf in ubuntu $
# $ pdfcrop in ubuntu $
echo "We are going to change the following $1.fig to a cropped pdf file"
function quit {
if [$1 == ""];
    echo "no files given";
    echo "Processing $1.fig............";
    figtex2eps $1.fig
    ps2pdf $1.eps
    pdfcrop $1.pdf
    echo "Done";

As someone has said that it is better to share your code and data openly as we as human are idiots and will make mistakes, please give me any suggestion to improve mine.