Archive for the ‘Justincase’ Category

Derivation of simple linear regression formula

May 20, 2018 Leave a comment

Finally, I have a time to post another mathematics blog post after absence for almost two years. This time I will discuss formula of simple linear regression. Suppose we have a set of data as follow :

(x_1,y_1),  (x_2,y_2), \ldots, (x_n,y_n).

We are going to fit those points using a linear equation Y = \beta_0 + \beta_1 X. This classical problem is known as a simple linear regression and is usually taught in elementary statistics class around the world. However, due to the rise of computer, students are only given the formula to compute the best estimation of \beta_0 and \beta_1 without the knowledge on how to derive them. These formula are:

\beta_1 = \displaystyle{\frac{\sum_{i=1}^n (x_i - \bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}} , and

\beta_0 = \displaystyle{\bar{y}-\beta_1 \bar{x}}


where \bar{x}, \bar{y} are the average of x_i and y_i respectively. In this post, I am going to derive these formula using four methods. The goal is always the same, this note will serve as a reminder for me when I forget how to derive these formula. If you learn statistics, these are many statistical assumptions such that this model and this formula work, but we are not going to discuss that here.


First, let us take a look at the following figure

We see that there are five points that are fitted using a line. To derive \beta_0 and \beta_1 using a Calculus, we have to define the so-called error function, E(\beta_0, \beta_1). The error function is defined as the sum of square error of each data. The formula is given as follow,

E(\beta_0,\beta_1)=\displaystyle{\sum_{i=1}^n (y_i - \hat{y}_i)^2=\sum_{i=1}^n (y_i - (\beta_0 + \beta_1 x_i))^2},

where \hat{y}_i is the prediction of the y coordinate using the fitted line. If we take a look at the figure above, the error function is the sum of squares of all the red lines. It is obvious that whenever \beta_0 and \beta_1 change, the error function changes. Therefore we have to find \beta_0 and \beta_1 such that the error function is minimal. This is a minimization problem and in Calculus we already know about how to minimize function of two variables. First we have to find the first partial derivative of E with respect to \beta_0 and \beta_1 and equate them to zero, i.e.

\frac{\partial E}{\partial \beta_0 }=0 and \frac{\partial E}{\partial \beta_1 }=0.

And we have to find \beta_o and \beta_1 that satistfy the above equations. Second, we have to check whether or not \beta_o and \beta_1 is the minimum or the maximum point or it could be a saddle point. However, our error function E is a convex function, therefore the solution of the first derivative equation always yields a minimum point. Let us find out what is the expression of the first derivative of E,

\frac{\partial E}{\partial \beta_0} = \sum_{i=1}^n (-2)(y_i-(\beta_0+\beta_1 x_i)) =0 and \frac{\partial E}{\partial \beta_1} = \sum_{i=1}^n (-2x_i)(y_i-(\beta_0+\beta_1 x_i)) =0.

Tidying up the above two equations we will get :

\sum_{i=1}^n y_i = \beta_0 n + \beta_1 \sum_{i=1}^n x_i, and \sum_{i=1}^n x_iy_i = \beta_0 \sum_{i=1}^n x_i + \beta_1 \sum_{i=1}^n x_i^2,

which are a system of linear equations with two variables. We can solve the above linear equations using Cramer’s rule or any other methods, and we should get the following solution :

\beta_0 = \frac{(\sum_{i=1}^n y_i)(\sum_{i=1}^n x_i^2)-(\sum_{i=1}^n x_i)(\sum_{i=1}^n x_iy_i)}{n(\sum_{i=1}^n x_i^2)-(\sum_{i=1}^n x_i)^2} and

\beta_1 = \frac{n(\sum_{i=1}^n x_iy_i)-(\sum_{i=1}^n x_i)(\sum_{i=1}^n y_i)}{n(\sum_{i=1}^n x_i^2)-(\sum_{i=1}^n x_i)^2}.

With a little bit of algebra manipulation, the above form is the same as the one in (1). And this is the end of the Calculus part.


In this approach, we assume that one is familiar with the concept of mathematical statistics such as random variables, expectations, variances and co-variances. Suppose X and Y are random variables with mean \mu_X and \mu_Y respectively. Random variable X and Y also have variances \sigma_X^2 and \sigma_Y^2 and they also have co-variance Cov(X, Y).

We want to find \beta_0 and \beta_1 such that random variable Y and \beta_0 + \beta_1 X is as close as possible. How are going to measure “as close as possible?” Let us define another random variable Z=(Y-(\beta_0 + \beta_1 X))^2. We also need to define a function g as a function of \beta_0, \beta_1 as follow :

g(\beta_0, \beta_1) = E(Z) = E[(Y-\beta_0 - \beta_1 X)^2].

Therefore, our problem will be similar to our approach in the first section. We need to find \beta_0 and \beta_1 such that g is minimum or expectation of Z is minimum. Expanding the equation above, we obtain:

g(\beta_0, \beta_1)=E(Y^2)-2\beta_0E(Y)-2\beta_1E(XY)+ \beta_0^2+2\beta_0\beta_1E(X)+\beta_1^2E(X^2).

The above equation is quadratic in \beta_0 and \beta_1 so the zeros of the first derivatives of the above expression is the minimum point. The first derivatives with respect to \beta_0 and \beta_1 are :

\frac{\partial g}{\partial \beta_0} = -2E(Y)+2\beta_0+2\beta_1E(X)=0,

\frac{\partial g}{\partial \beta_1} = -2E(XY)+2\beta_0E(X)+2\beta_1E(X^2)=0,

which we can rewrite as:



Multiplying the first equation by E(X) and subtracting it to the second equation, we obtain:

\beta_1 = \frac{E(XY)-E(X)E(Y)}{E(X^2)-E(X)^2} = \frac{Cov(X,Y)}{\sigma_X^2},


\beta_0 = E(Y) - \beta_1 E(X)=\mu_Y - \beta_1 \mu_X.

The above solution is in the form of random variables. However, what we have is data which is in the form (x_1,y_1),  (x_2,y_2), \ldots, (x_n,y_n). So, instead of \mu_X we have \bar{x}, and instead of \mu_Y we have \bar{y}. The same also applies on variances and co-variance, so in the end, the above solution is the same as equation (1). This is the end of the statistics approach.

Linear Algebra

This time, we are going to derive the formula using linear algebra approach. Personally, I think this approach is the most elegant of all other approaches. However, I need to say that one must know a little bit about linear algebra first before reading this, otherwise it would not make sense. So, using a set of data (x_1,y_1),  (x_2,y_2), \ldots, (x_n,y_n), we want to find \beta_0 and \beta_1. Plugging all the data into the linear equation y = \beta_0 + \beta_1 x, we will obtain:

\beta_0 + \beta_1 x_1 = y_1,
\beta_0 + \beta_1 x_2 = y_2,
\beta_0 + \beta_1 x_n = y_n.

The above systems can be rewritten in matrix form, as follow:

\left( \begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array}\right) \left( \begin{array}{c} \beta_0 \\ \beta_1 \end{array}\right)= \left( \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array}\right),


Ax =w,

where A is a n \times 2 matrix, x = (\beta_0,\beta_1)^T and w is a vector containing y_1, \dots, y_n. In linear algebra, the equation above is known as the over-determined linear equation. It may have a solution but mostly it does not have one. One method to solve such system is using the theory of projection. Suppose we have a linear equation Ax=w. According to the theory of linear algebra, this equation has a solution only if w is inside the column space of A. In the above case, where we want to find \beta_0 and \beta_1, our w is clearly not in the column space of A.

Therefore, to find the solution of the linear equation, we must bring w to the column space of matrix A. We are going to project w to the column space of A. We can do many different projections but according to the theory of linear algebra, we must do an orthogonal projections such that our estimation of \beta_0 and \beta_1 will bring Ax “closest” to w. Let us now discuss the projection theory first.

Consider the following figure.

Suppose we want to project w perpendicularly to the vector space V. Suppose w' is the result of the orthogonal projection and V is the column space of A. Then the linear equation Ax=w' has a solution since w' is inside the column space of A. Consider V^{\perp} which is the orthogonal complement of V. Clearly, vector w-w' is inside this space. From the theory of linear algebra, we know that if V is the column space of A, V^{\perp} is the null space of A^T. Therefore we have A^T (w-w')=0, or A^tw'=A^Tw. Since we know that w' = Ax then we have A^TA x = A^T w. If the matrix A^TA is invertible (mostly it is in our case) we can find the solution of our problem which is x = (A^TA)^{-1}A^T w.

That’s it, we have done it. If we substitute back every variable to our original variable, we will have our solution, i.e.

A = \left( \begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array}\right), x= \left( \begin{array}{c} \beta_0 \\ \beta_1 \end{array}\right), w = \left( \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array}\right).

With a little bit of algebra (again), we can easily show that the formula x=(A^TA)^{-1}A^Tw is the same as the equation (1).

Numerical approach

Our last approach is using the theory of numerical methods. Since this approach is numerical, we are not going to get the solution as in the equation (1). This approach will only try to approximate it. This approach will use some of the result obtained in the Calculus approach, which is the error function i.e.,

E(\beta_0,\beta_1)=\displaystyle{\sum_{i=1}^n (y_i - \hat{y}_i)^2=\sum_{i=1}^n (y_i - (\beta_0 + \beta_1 x_i))^2}.

However, we are not going to label it as an error function, instead we are going to label it as a cost function. We say this function as a cost function because every time we pick the wrong \beta_0 and \beta_1 there is a cost that we have to pay. So we have to find the right \beta_0 and \beta_1 such that the cost is minimum.

There are many numerical methods to solve such a problem, but at this time we are going to discuss only one methods, i.e. gradient descent. The basic idea is we start with arbitrary values of \beta_0 and \beta_1 and we are going to continually change the value of \beta_0 and \beta_1 in such a way so that the cost function decreases. Hopefully, we end up at the value where the cost function is minimum.

The following is the general procedure of the gradient descent method :

repeat until convergence {

\beta_j := \beta_j - \alpha \frac{\partial}{\partial \beta_j} E(\beta_0,\beta_1)     (for j=0 and j=1)


The above procedure will be executed continually until the value of \beta_0 and \beta_1 convergent, which means that we are getting close to the value where the cost function is minimum. The symbol \alpha is called the learning rate. It always takes a positive value and if this variable is too large, \beta_j will change drastically and we sometimes cannot get the result that we want. But if \alpha is too small, our computation will be very slow, although it is guaranteed that we will get the minimum cost function.

Finally, we need to know some Calculus to derive the partial derivative of the cost function with respect to \beta_0 and \beta_1. For the readers who know Calculus, they can derive themselves, and we will obtain :

\frac{\partial }{\partial \beta_0}E=-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i)),

\frac{\partial }{\partial \beta_1}E=-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i))x_i.

Let’s plug this into the gradient descent algorithm and we will have:

repeat until convergence {

\beta_0 := \beta_0 - \alpha (-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i)))

\beta_1 := \beta_1 - \alpha (-2\sum_{i=1}^n(y_i - (\beta_0 + \beta_1 x_i))x_i)


Let’s take an example. Here is a simple python code to illustrate the idea.

import math
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 2, 2.5, 4, 4.5])
b0,b1 = 0.0, 0.1
val_x = np.linspace(1.0,5.0,100)
val_y = b0+b1*val_x

Running the above code, we will get the following picture.

The red dots are the data that we are going to fit. The blue line is just one example of \beta_0 and \beta_1 that of course, is a poor fit. We are going to fix it using gradient descent method.

def costfunction(b0,b1,x,y):
    sum = 0
    n = len(x)
    for i in range(0,n):
        sum = sum + (y[i]-(b0+b1*x[i]))**2
return sum
def grad(b0,b1,x,y):
    n = len(x)
    sum1,sum2 = 0,0
    for i in range(0,n):
        sum1 = sum1 + (y[i]-(b0+b1*x[i]))
        sum2 = sum2 + (y[i]-(b0+b1*x[i]))*x[i]
return [(-2)*sum1, (-2)*sum2]
b0,b1 = 0.0, 0.1
maxiter = 300
alpha = 0.005
c = np.zeros(maxiter)
c[0] = costfunction(b0,b1,x,y)
for i in range(1,maxiter):
    v = grad(b0,b1,x,y)
    temp1 = b0 - alpha*v[0]
    temp2 = b1 - alpha*v[1]
    b0 = temp1
    b1 = temp2
    c[i] = costfunction(b0,b1,x,y)
print("$\beta_0$ = ",b0,"$\beta_1$ = ",b1)
val_x = np.linspace(1.0,5.0,100)
val_y = b0+b1*val_x

Running the second code above, we would get the desired result.  From the figure above, we see that our fit is close enough with our data. 

The theoretical value of \beta_0 and \beta_1 are 0.1 and 0.9 respectively, but in the code above we will obtain only approximation value that was obtained by performing 300 iterations.


Taylor method with automatic differentiation

June 14, 2016 1 comment

I teach numerical analysis course and in the one of the chapters, I teach my students about the numerical solution of ordinary differential equations (ODEs). One of numerical solution methods I teach is the so-called Taylor method. This method works by applying the Taylor series to the solution of the differential equation. Suppose we have one-dimensional differential equation as follow:

\frac{dy}{dt} = f(y) and y(t) = y_0.

Let’s assume that we only consider an autonomous ODE first.  To find a solution at the next step, y(t+h), we need to apply Taylor expansion, as follow:

y(t+h) = y(t) + y'(t) h + y''(t)\frac{h^2}{2} +y'''(t)\frac{h^3}{6}+\dots.

Therefore, depending on how accurate we want our solution is, all we need is to compute the derivatives of y(t). Unfortunately, computing all the derivatives is not an easy task for a computer. We have to derive all the derivatives by ourselves and then input them to the computer.

In this note, I will introduce an automatic differentiation as a tool to compute the derivatives of y(t+h) so that we can compute y(t+h) as accurately as possible. Let us consider the series expansion of y(t+h) as above, but I will use constant a_i to replace the i-th derivative of y(t),

y(t+h) = a_0 + a_1 h + a_2 h^2 + \dots.

Let us derive the above series with respect to h.

y'(t+h) = a_1 + 2 a_2 h + 3a_3 h^2 + \dots.

Notice that the derivative at the left hand side of the above equation is actually equal to f(y(t+h)) since we know that y(t+h) satisfies the ordinary differential equation. Then we have,

f(y(t+h)) = a_1 + 2 a_2 h + 3a_3 h^2 + \dots.

Since y(t+h) is a series of h then f(y(t+h)) is also a series of h. Then both sides of the above equation are series of h, thus we can equate each of the coefficient of h^j to get a_1a_2 and so on. The most difficult task in this problem is to find the series of f(y(t+h)) since the function f(y) can be anything.

Let’s take an example. Suppose we want to integrate the following differential equation,

\frac{dy}{dt} = y(1-y) and y(0)=0.5.

Let us choose h=0.1 and compute y(0.1). As before, we assume that y(0.1) is a series of h as follows,

y(0.1) = a_0 + a_1 h + a_2 h^2 + \dots.

Differentiating both sides of the above equation, we get

y'(0.1) = a_1 + 2 a_2 h + 3 a_3 h^2 + \dots.

The right hand side of the above is equal to

y'(0.1) = y(0.1)(1-y(0.1)) = (a_0 + a_1 h + \dots)(1  - (a_0 + a_1 h + \dots)) = b_0 + b_1 h + b_2 h^2 + \dots.

Therefore, we get,

a_1 = b_0a_2 = b_1/2,a_3 = b_2/3,  and so on.

The real question is how to find b_0, b_1 and so on. If the function f(y) only involves operations such as addition, subtraction and multiplication, it would be easy to find the b‘s. But if the function f(y) has operations such as division, exponential, logarithm and trigonometric functions then we have to think a bit harder on how to find b‘s. We go back to the previous example. Since the operations in this example are only multiplication and subtraction, we can easily compute b_0, b_1 and so on. Using algebraic modification, we get the following,

b_0 = a_0 \times (1-a_0), thus a_1 =a_0 \times (1-a_0)
b_1 = (a_0 \times -a_1 ) + (a_1 \times (1-a_0)), thus a_2 =((a_0 \times -a_1 ) + (a_1 \times (1-a_0)))/2
b_2 = (a_0 \times -a_2 ) + (a_1 \times (-a_1))+(a_2 \times (1-a_0)), thus a_3 =((a_0 \times -a_2 ) + (a_1 \times (-a_1))+(a_2 \times (1-a_0)))/3

and so on.

The more a‘s we compute, the more accurate our solution is. Using Taylor method this way, we can easily apply this method in a program. It should be clear why this method is called automatic differentiation because we don’t need to manually derive all the derivatives.

Next, I tried to numerically integrate the previous example. Below is the graphic of the error of Taylor method with automatic differentiation versus time. As we can see from the figure, that the error of our numerical integrator is so small and it is almost exact. The number of a I used in this graph is 10. error_plot

I use python to numerically integrate this example. You can find the code to produce the above figure in the end of this note. One of the drawbacks of this method is when you change your ODE, you have to change your code as well. In other numerical integrators such as Runge Kutta, we don’t need to change the whole code, we just need to change the ODE and we should get the solution. Together with my student, I am trying to tackle this problem and I hope it will be posted in here.

import math as m
import numpy as np
import pylab as plt

#preliminary constants
order = 10
h = 0.1
t = 0.0
t_end = 5
y0 = 0.5
#defining solution
sol = []
time = []
#start of the program
while t<t_end:
	a = []
	for k in range(1,order):
		if k == 1:
			sum = 0.0
			for j in range(0,k-1):
				sum = sum + a[j]*(-a[k-1-j])
	sum = 0.0
	for i in range(0,order):
		sum = sum + a[i]*h**i
	t = t + h
#end of the program

#real solution computation
real_sol = []
for t in time:
	temp = m.exp(t)/(m.exp(t)+1.0)

#plotting the error versus time
sol = np.array(sol)
real_sol = np.array(real_sol)
time = np.array(time)
error = sol-real_sol

My first try creating a bash script

January 30, 2016 2 comments

I am a ubuntu user, but sometimes I also use windows, mainly because my computer in the office is using windows. So, I use both ubuntu and windows. Even, my laptop has both windows and ubuntu. I tried to get rid of windows once, and installed wine to use windows application in my ubuntu. But it runs very slowly and it’s killing me and in the end, I came back to reinstall windows. I mainly use ubuntu for my research. The softwares I use for my research are auto, python, dstool, latex, xfig which run smoothly in ubuntu, even though python and latex can also be installed and run smoothly on windows machine. On the other hand, I use windows to do some regular activities such as browsing the internet, watching movies, checking my email, creating an office documents etc. All of which can be done in ubuntu as well. But there are activities that I must use windows, I sometimes need to use matlab and sometimes I like to play a game that only runs in windows. These two things are the main reason I still come back to use windows. Recently, I learn python so I am now trying to less use matlab.

That is just a background and my main point here is about bash scripting. After a few years using ubuntu, I have not created any bash script. Today, finally I learn to create one script. I created a script to automate my boring routine. When I write a paper, I need some illustrations. I mostly use xfig to create some mathematical images and to be able to use \LaTeX in the figure I need to convert it to an eps file. The produced eps file will be then converted to a pdf file as it is perfectly compatible to my pdflatex command. But before that, I need to crop the resulted pdf file in order to remove white space around the image. Suppose the name of my xfig file is spam.fig. I then write a series of command.

figtex2eps spam.fig
ps2pdf spam.eps
pdfcrop spam.pdf

I want to write a script that do all the above automatically. Thus I created the following script.

# Converts a .fig-file (xfig file) to a .eps-file by using a built-in function figtex2eps
# and then convert it to a .pdf file by using a built-in function ps2pdf
# and finally convert it to a cropped pdf file by using a built-in function pdfcrop
# ivanky saputra
# credit to :
# $ /home/abel/c/berland/etc/cvsrepository/figtex2eps/prog/figtex2eps $
# $ ps2pdf in ubuntu $
# $ pdfcrop in ubuntu $
echo "We are going to change the following $1.fig to a cropped pdf file"
function quit {
if [$1 == ""];
    echo "no files given";
    echo "Processing $1.fig............";
    figtex2eps $1.fig
    ps2pdf $1.eps
    pdfcrop $1.pdf
    echo "Done";

As someone has said that it is better to share your code and data openly as we as human are idiots and will make mistakes, please give me any suggestion to improve mine.

My first Coursera course

December 3, 2014 Leave a comment

In the middle of this year, I received a Fulbright Visiting Scholar that allowed me to go to Athens, OH, US. I spent four months in Ohio University starting in September, which means that I took a leave in the university I am working. One of my activities while I was in Ohio University was I took a course from Coursera. At that time I was eager to learn and to know python programming language therefore I decided to take the course, An Introduction to Interactive Programming in Python, offered by Rice University taught by the following professors: Joe Warren, Scott Rixner, John Griener, and Stephen Wong.

My first coursera was nothing but really, really good. The class could not definitely be better. The intructors have put so much effort in preparing the class and at the same time they were really enjoyable and fun. I really learned a lot from this class and I recommend everyone to take this class. After nine weeks, I finished the course and proudly said that I got 96.9% with distinction mark.

But what I wanted to tell here is the content of the course. From knowing nothing of Python I now know that the word ‘python’ is originally from the Month Python’s Flying Circus not the reptilia. I also know how useful this new programming language is. Python can be used to build an interactive game, to conduct some scientific computations, to analyse data science and many more. There are already many communities that use Python and they are (still) growing. One more good thing is the fact that it is free. Python works on either Windows, Linux or Mac and the installation on each device is not that difficult. I am using both Windows and Ubuntu. For Windows, I use mainly Python(x,y) and for Ubuntu, I don’t have to do anything as it is already inside. However, in this course, I did not have to use all of these as one of the instructors has built a nice Python web application that is called Codeskulptor. It will allows us to make a python scripts on your web browser so actually you don’t have to install python in your computer.

From the course, I learned to make interactive games. Even though the games I made were not perfect but I think it is playable and fun enough. I learned to make games like rock-paper-scissors-spock-lizzard, number guessing game, stopwatch game, pong, memory, and asteroids. The following links are the python scripts for all of the above games I made during the course. The link will bring you to a codeskulptor website, in which I had already written a python script. You just need to press the play button on the top left of your screen.

1. RPSLS  This game is called rock-paper-scissors-lizzard-spock game. It is an extention of rock-paper-scissors and first appeared in The Big Bang Theory series.

2. Number guessing game This game, as you might guess, is a number guessing game. The computer hide a number between a small number and a big number and you have finite tries to guess what number the computer is hiding.

3. Stopwatch This game will train your reflexes. It will show a stopwatch and you have to press the pause button every whole second.

4. Pong Oh, please tell me you know this game.

5. Memory There are 16 cards facing down, and all of them come in pair. Figure out the pair with the smallest number of tries.

6. Blackjack The blackjack game, yes, it is the blackjack game that you always know.

7. Asteroids Asteroids is an arcade space shooter game and I think I played this game when I was a kid. I played this game using an Atari device.

Other than those games, I made two another python scripts just for fun. The first is a fan and the second is the illustration of Hilbert’s curve. Hope you like it.

Finding a Hopf bifurcation from a cube equation

September 27, 2014 1 comment

Let me first state what I found out this week. Forgive me if this result is trivial for you, but this result indeed helped me in my research.

Lemma 1
Suppose we have a cube equation x^3 + ax^2 + bx + c=0. The roots of this equation are a couple of purely imaginary roots and a single real root if and only if the following conditions are satisfied b>0 and c=ab.

The proof is very easy and I am not going to prove this. Apparently, the conditions b>0 and c=ab are a necessary and sufficient condition for the cube equations to have a single real root and a couple of purely imaginary roots.

Okay, now let us move to the beginning why I bother with cube equations in the first place.

At the moment I am studying a mathematical model on HIV-1 virus with cell mediated immunity. I just started so I still don’t know all the representation of this model in the real life. But here is the model.

\dot{x} = \lambda - dx - \beta x v
\dot{y} = \beta x v - ay - p y z
\dot{v} = k y- u v
\dot{w} = c y w - c q y w - b w
\dot{z} = c q y w - h z

Yes, you are right, it is a five dimensional model. The variables x, y, v, w, z are the state variables that are representing the density of uninfected cells, the infected cells, the density of virus, the CTL response cells and the CTL effector cells respectively. Why it is five dimensional, what are the parameters representing and other modeling questions, I still don’t know the answer. At the moment I am interested to find an equilibrium in this system that can undergo a Hopf bifurcation. An equilibrium can have a Hopf if the Jacobian of the system evaluated at this equilibrium has a purely imaginary eigenvalue. Actually this is the first time I handle a five dimensional system so it will be interesting.

Now we know why I need Lemma 1.

In my case, the situation is greatly simplified as all parameters are numbers except two which are d and b. From the paper that I read, I found that there are three equilibria in the system, so called E_0, E_1, E_2. At the moment I am interested in the equilibrium E_1 because I suspected that this equilibrium can have a Hopf bifurcation and because the coordinate of this equilibrium has the following conditon w=z=0. Let us have a look at the Jacobian matrix evaluated at E_1=(x_1,y_1,v_1,0,0).

J = \left( \begin{array}{ccccc} -d-\beta v_1 & 0 & -\beta x_1 & 0 & 0\\\beta v_1 & -a & \beta x_1 & 0 & -p y_1 \\ 0 & k & -u & 0 & 0 \\ 0 & 0 & 0 & cy_1-c q y_1 - b & 0 \\ 0 & 0 & 0 & c q y_1 & -h \end{array} \right)

As I said before, all the parameters are numbers except d and b. Also E_1 will be an expression that depends on d and b. As a result the matrix above will be depending on d,b.

We are interested in what value of d,b such that the matrix above has a purely imaginary eigenvalues. Even though the above matrix is not a block matrix but when we compute the eigenvalues, we can really separate the matrix above. We can obtain the eigenvalues of the above matrix by computing the eigenvalues of the following two matrices:

A = \left( \begin{array}{ccccc} -d-\beta v_1 & 0 & -\beta x_1 \\ \beta v_1 & -a & \beta x_1 \\ 0 & k & -u \\ \end{array} \right) and

B = \left( \begin{array}{ccccc} - cy_1-c q y_1 - b & 0 \\ c q y_1 & -h \end{array} \right).

The eigenvalues of matrix B is easy and it is real. So the problem really is in the computing the eigenvalue of matrix A. Consider the matrix A in a simpler notation.

A = \left( \begin{array}{ccccc} a & b & c \\ d & e & f \\ g & h & i \\ \end{array} \right).

The characteristic polynomial of the above matrix is

p(\lambda) = \lambda^3 - (a+d+g) \lambda^2 + (bd-ae+fh-ei+cg-ai) \lambda + \det(A)

Therefore to find the value d,b such that we have a Hopf bifurcation, we only need to solve the following conditions:
1. make sure that (bd-ae+fh-ei+cg-ai) is positive and
2. solve - (a+d+g) \times (bd-ae+fh-ei+cg-ai) = \det(A).

I created this note (which is a part of our paper) so that I won’t forget what I had done. We don’t usually show this kind of computation but for me this little computation will be useful. Even though I uses software to compute and investigate the existence of Hopf bifurcation but it does not show the exact value of the parameter.  Using algebraic approach I found the exact value.

Reference of the model
Yu, Huang, and Jiang, Dynamics of an HIV-1 infection model with cell mediated immunity, Communications in Nonlinear Science and Numerical Simulation, 2014, vol. 19.

Additional note: (Two dimensional case) Suppose the Jacobian matrix evaluated at the equilibrium is given below

B = \left( \begin{array}{ccccc} p & q \\ r & s \end{array} \right).

Then the equilibrium has a Hopf bifurcation if the following conditions are satisfied:
1. \det(B) > 0 and
2. trace(B) = 0.



Random Numbers in Matlab

May 23, 2014 3 comments

In the past two years, I have been supervising bachelor degree students for their final projects. Unfortunately, most of the projects were related to time series, forecasting, stochastics process, financial mathematics and many topics related to application to statistics in finance. To be honest, this is not my strong suit. As a result, we struggled when reading and using many statistical techniques especially the one in time series. With this notes, I hope to understand and to keep remember those techniques. Moreover, as I am teaching Time Series Analysis this semester, I believe this post will be very useful for my students.

In this post, I concentrate on how to generate random numbers with certain distributions in Matlab and its application in my time series lecture. I know that we can easily google on how to do these, but as always, it would be very nice to have this ready in my blog. When I forget to do this, I don’t have to google it again and save myself a few minutes.

Generating random numbers and introduction to subplot, histogram, state commands in matlab

To generate random numbers in Matlab with certain distributions, I  need to type

x1 = normrnd(mean,std,[m,n]); %normal distribution
x2 = binornd(N,p,[m,n]); %binomial distribution
x3 = exprnd(lambda,[m,n]); %exponential distribution
x4 = gamrnd(a,b,[m,n]); %Gamma distribution
x5 = rand(m,n); %uniform distribution
x6 = poissrnd(lambda,[m,n]); %Poisson distribution

The commands above returns an m x n matrix containing pseudorandom values drawn from normal, binomial, exponential, gamma, uniform and Poisson distributions respectively. Some usefull commands related to random number generator in Matlab are

subplot(2,2,1) %top left figure
hist(x1) %histogram of normal random numbers
subplot(2,2,2) %top right figure
hist(x2) %histogram of binomial random numbers
subplot(2,2,3) %bottom left figure
hist(x3) %histogram of exponential random numbers
subplot(2,2,4) %bottom right figure
hist(x4) %histogram of Gamma random numbers

The above commands return histograms of random numbers generated by the previous commands.  The output of those commands is the following figure.


Another useful command regarding random number generators is the ‘state’ option. This is very usefull if we want to repeat our computation that involves random number. The following code will help us understand how to use the state command.

theta = -0.8;
phi = 0.9;
mu = 0;
sigma = 0.1;
%-----generating random numbers-------------------------
e = normrnd(mu,sigma,[100 1]);
%-----generating MA(1) and AR(1) process----------------
y(1) = 0;
z(1) = 1;
for i = 2:length(e)
y(i) = e(i) - theta*e(i-1);
z(i) = phi*z(i-1) + e(i);
%-----plotting MA(1) and AR(1) process----------------

Using the above command, especially when we write the ‘ state’ option in line 8, it allows us to repeat our computation later. We can show the result of our computation tomorrow exactly the same as our computation we conduct today, even though our code involves a random number.  The result of the above command is plotted in the following figure.


Plotting the auto correlation function and introduction to bar command

In time series analysis, when we have a time series, it is common to plot the sample auto correlation function (ACF) to be compared to time series models we have in the textbooks. The thing is today, our students now rely heavily on statistical softwares to plot the ACF so that they forget how to plot it in the beginning.  Recall that the sample ACF of the observed series Y_1, Y_2, \dots, Y_n is defined as

(1)——–\displaystyle{r_k = \frac{\sum_{t=k+1}^{n}(Y_t - \bar{Y})(Y_{t-k}-\bar{Y})}{\sum_{t=1}^{n}(Y_t - \bar{Y})^2}}

The following code shows a MATLAB code that will plot the ACF of both time series generated by the code above.

sumY = 0;
sumZ = 0;
for i = 1:length(y)
sumY = sumY + y(i);
sumZ = sumZ + z(i);
ybar = sumY/length(y);
zbar = sumZ/length(z);
sum2Y = 0;
sum2Z = 0;
for i = 1:length(y)
sum2Y = sum2Y + (y(i)-ybar)^2;
sum2Z = sum2Z + (z(i)-zbar)^2;
for k = 1:length(y)
sum3Y = 0;
sum3Z = 0;
for t = k+1 : length(y)
sum3Y = sum3Y + (y(t)-ybar)*(y(t-k)-ybar);
sum3Z = sum3Z + (z(t)-zbar)*(z(t-k)-zbar);
ry(k) = sum3Y/sum2Y;
rz(k) = sum3Z/sum2Z;

The last two commands above will plot the ACF of time series y and z in the bar style as it is always for the ACF, as shown in the following figure.




It doesn’t show, really, the supposed to be acf of MA(1) (above) and AR(1) (below) process. Perhaps this is I have not learned  the sample acf measurement (1).

How to export/import data between Matlab and Excell

Finally, we are going to write a code in Matlab on how to export or import data to/from Excell in Matlab. Again, this is to save myself a few minutes rather than googling it around the internet.

%--------------write data to excell-----------------------
y = transpose(y);
z = transpose(z);
filename = 'ma_ar_data.xlsx';
%-------------read data from excell----------------------
filename = 'ma_ar_data.xlsx';
sheet = 1;
xlRange = 'A1:A100';
dataY = xlsread(filename, sheet, xlRange);

The transpose command above puts the variable y generated before to be a column vector.


In this post, I have created various codes so that I am able to remember

  • generating random numbers
  • using subplot
  • using histogram and bar
  • exporting/importing data to/from excell
  • plotting the ACF


Liapunov exponent computation

October 4, 2013 Leave a comment

Liapunov exponent method is one of the clever ways to investigate a certain dynamical system to be chaotic. In this post, I shall try to compute Liapunov exponent of one famous discrete dynamical system, i.e. the logistic map. After this, I shall try to compute the Liapunov exponent of a continuous dynamical system.

Actually, I write this post because I want to know how to numerically compute the Liapunov exponent. I often read a book or a website such as wolfram.alpha that shows the Liapunov exponent but it didn’t explain how to get this numerically. After I am able to post this post, and when I forget how to compute this exponent, I’d know where to look.

Suppose we have an initial condition x_0 and a nearby initial point x_0 + \delta_0, where \delta_0 is an initial difference between initial points, which by the way will be extremely small. These two nearby points will be iterated by some map. Let \delta_n be the difference between these two points after n iterations. Consider the following equation

(1) …  |\delta_n| =|\delta_0 |e^{n \lambda}.

The coefficient \lambda is called the Liapunov exponent. We want to compute this coefficient. Note that \delta_n = f^n(x_0)-f^n(x_0+\delta_0), where f is the given map. Taking logarithm of (1), we get

(2) … \lambda = \frac{1}{n} \ln \left|\frac{\delta_n}{\delta_0}\right|=\frac{1}{n} \ln \left|\frac{f^n(x_0)-f^n(x_0+\delta_0)}{\delta_0}\right|=\frac{1}{n} \ln \left|{(f^n)'(x_0)}\right|

as \delta_0 \to 0. The term inside the logarithm can be simplified by using the chain rule: (f^n)'(x_0) = f'(x_{n-1}) f'(x_{n-2}) \dots f'(x_0). Thus (2) becomes

(3) … \lambda = \frac{1}{n} \sum_{i=0}^{n-1} \ln |f'(x_i)|.

When n \rightarrow \infty, \lambda is defined as the Liapunov exponent. When this exponent is positive, it is a sign of a chaotic attractor.

Let us consider the logistic map that was in my previous post here. Let us fix the parameter value r. Starting at a random initial point, we will iterate the initial point until it reaches the steady state (which will be a fixed point, periodic or aperiodic), say 300 iterates. Now we compute a large number iterates, say 10000 and compute (3) starting from the 301st point. Repeat this procedure for the next r. Then we plot the Liapunov exponent with respect to r. We shall find a figure as follows.


As we can see from the above figure. The map is chaotic in some values of parameter between 3.5 to 4, in which the Liapunov exponent is positive.

The code to produce the above figure is the following

tic; %to compute how long is the computation
clear;clc; %clear screem and clear all the previous variables
a = 3:0.001:4; %parameter of the logistic map
N = 300; %number of iterations until it is stable
N1 = 10000; %number of iterations to compute the Liapunov exponent
y = zeros(length(a),1); %declares variable for Liapunov exponents
for i=1:length(a)
x(1) = rand; %random initial conditions%
for j = 2:N
x(j) = a(i)*x(j-1)*(1-x(j-1));
sum = 0;
for j = 1:N1
x(N+j) = a(i)*x(N+j-1)*(1-x(N+j-1));
sum = sum + log(abs(a(i)-2*a(i)*x(N+j)));
y(i) = sum/10000;
fprintf('%d %d\n',i,length(a));
plot(a,0); hold on;
plot(a,y); %plot Liapunov exponent against parameter
toc; %display how long computation is