AaronMora_Code: Estimating a simple regression on TensorFlow

Note: I crated this post in Rstudio using reticulate to run Python code, to do so install this package and then create a python session. However, if you are using an python IDE then there is no need for the following chunk of code.

library(reticulate)
use_python("C:/ProgramData/Anaconda3", required =TRUE) #If you want to use your base anaconda enviroment
py_config()

python:         C:/ProgramData/Anaconda3/python.exe
libpython:      C:/ProgramData/Anaconda3/python38.dll
pythonhome:     C:/ProgramData/Anaconda3
version:        3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:/ProgramData/Anaconda3/Lib/site-packages/numpy
numpy_version:  1.20.3

NOTE: Python version was forced by RETICULATE_PYTHON

import numpy as np
import pandas as pd
import tensorflow as tf

Data Generating Process

Here we are going to implement the following data generating process:

\[y_i=\beta_0+\beta_1x_{i1}+\beta_2 x_{i2}+\beta_3 x_{i3}+\varepsilon_i\]

where \(\varepsilon_i\sim N(0,\sigma^2)\), and the true parameters are given by

\[\beta^*=(\beta_0,\beta_1,\beta_2,\beta_3)=(1,2,3,4)\] \[\sigma^*=1.5\]

And the covariates are generated as follow: \[\begin{align} x_{1i} &\sim N(0,1) \\ x_{2i} &\sim Unif[-1,1] \\ x_{3i} &\sim Exp(1) \end{align}\]

def dgp(N):
    """Function to generate a sample of size N for the previous DGP
    Parameters:
    ----------
    N:  Integer,
       Sample Size

    Output:
    ----------
    Y:  Nx1 Array,
       Array containing the dependent variables
    X:  Nx4 Array,
       Array containing the independent variables
    """
    #Generate the Error Term
    E=np.random.normal(0,1.5,size=(N,1))
    #Generate the Covariates
    X0=np.ones((N,1))
    X1=np.random.randn(N,1)
    X2=np.random.uniform(-1,1,size=(N,1))
    X3=np.random.exponential(1, size=(N,1))
    #Construct Y
    Y=X0+2*X1+3*X2+4*X3+E
    #Return
    return Y, np.hstack((X0,X1,X2,X3))

#Generate a sample of size N=10000
N=10000
np.random.seed(145)
Y,X=dgp(N)
print(Y[:4])

[[ 0.53471528]
 [ 4.94094422]
 [ 5.61397342]
 [14.92933068]]

print("Shape of the Dep variable Y: ",Y.shape)

Shape of the Dep variable Y:  (10000, 1)

print("Shape of the Ind variable X: ",X.shape)

# Do a Train/Test Split

Shape of the Ind variable X:  (10000, 4)

Split=int(np.floor(0.7*N))
X_train=X[:Split,:]; X_test=X[Split:,:]
Y_train=Y[:Split].reshape((-1,1)); Y_test=Y[Split:].reshape((-1,1))

Estimation with Analytical OLS

In a sample with \(N\) individuals, we can estimate \(\beta\) using least squares which solution yields:

\[\hat{\beta}=(X'X)^{-1}X'Y\]

where \(Y\) is a \(N\times 1\) vector of \(\{y_i\}\) stacked, and similarly \(X\) is a \(N\times4\) matrix of stacked covariates for observation \(i\). For implementing the normal equation we only need Numpy functions.

def basic_ols(X,Y):
    """
    Function that implements the normal equation for an OLS estimator
    Parameters:
    ----------
    Y:  Nx1 Array,
       Array containing the dependent variables
    X:  Nx4 Array,
       Array containing the independent variables

    Output:
    ----------
    beta:   4x1 array,
            OLS estimator
    """
    beta=np.linalg.solve(np.dot(X.T,X),np.dot(X.T,Y))
    return beta

#Estimate beta with the normal equation
beta_numpy=basic_ols(X_train,Y_train)
print("The estimated values for beta are: \n", beta_numpy)

The estimated values for beta are: 
 [[0.99527478]
 [2.01702781]
 [3.04739935]
 [3.99854234]]

Estimation with Sklearn

We can compare our explicit implementation with the one a standard library function would produce, for doing so we will use LinearRegression from the Sklearn package.

#Import the function and create an instance of the model 
from sklearn.linear_model import LinearRegression
lin_reg=LinearRegression(fit_intercept=True)

#Apply the model to our data
reg=lin_reg.fit(X_train, Y_train)
beta_sklearn=np.append(reg.intercept_,reg.coef_[0,1:]).reshape((-1,1))

print("Beta estimates using sklearn: \n", beta_sklearn)

Beta estimates using sklearn: 
 [[0.99527478]
 [2.01702781]
 [3.04739935]
 [3.99854234]]

Estimation with TensorFlow

Now lets obtain estimates for \(\beta\) using tensorflow, for doing so we have to create a model that maps the dependencies between data and parameters into a computational graph.

# Reset the current tensorflow computational graph
tf.compat.v1.reset_default_graph()
tf.random.set_seed(145)

Replicate analytical solution with Tensor Flow

In the following we implement the normal equation solution using tensorflow syntax. Notice that we have done the optimization ourselves and we know that the normal equation provides the maximizing parameters for least squares, so tensorflow is not doing any optimization.

#We need to define our objects into TensorFlow to build "A GRAPH"
x=tf.constant(X_train,dtype=tf.float32,name="x")
y=tf.constant(Y_train,dtype=tf.float32,name="y")
xT=tf.transpose(x)
#The following defines the relationship between data and beta but it is just a relationship stated in TF
beta=tf.matmul(tf.linalg.inv(tf.matmul(xT,x)),tf.matmul(xT,y))
print("Beta estimates using TensorFlow: \n")

Beta estimates using TensorFlow:

tf.print(beta)

Using TensorFlow 1.0 syntax higlights the role of relationship versus evaluation

with tf.compat.v1.Session() as sess:
    beta_tf=beta.numpy() #This converts the relationship into a numpy array
print("Beta estimates using TensorFlow: \n", beta_tf)

Beta estimates using TensorFlow: 
 [[0.99528027]
 [2.0170271 ]
 [3.0473988 ]
 [3.998537  ]]

The previous implementation looks more like a hassle cause we have to define a relationship and the evaluate it, instead of just evaluating the normal equation, but this syntaxis is very useful to later save time in the optimization step. However this feature was simplified in TensorFlow 2.0 where execution is done eagerly (like Python normally does).

Tensor Flow Training

The following describes the strategy for training the model to obtain parameter estimates:

Define a computational graph (that is a relationship) from features/covariates to ouput/predicted values for \(Y\). This is done defining a class referring to a keras model, in our case we pass as a input the number of features (covariates).
After defining the main relationship between inputs and output we would define the loss function and gradients necessary for optimization.
We also need to define an instance of the model, define the optimizer we will use and its related tunning parameters.
Finally we iteratively train the model for a number of iterations. Notice that in this simple case we do not use epoch’s or batches since the context is very simple.

tf.keras.backend.set_floatx('float64')
#Declare the main relationship/layer from features to predicted Y
class LinReg_TF(tf.keras.Model):
    def __init__(self,n_features=2):  #n_features=2 is just a default value, updatable on call
        super(LinReg_TF, self).__init__()
        #Create some initial weights for the model 
        w_init=tf.random_normal_initializer()
        self.W=tf.Variable(initial_value=w_init(shape=(n_features, 1),
                                                  dtype="float64"),trainable=True)
    def call(self, inputs):
        return tf.matmul(inputs, self.W)  #Returns the fitted values for given inputs

#Declare the loss function
def loss(model, inputs, targets):
  error=model(inputs)-targets
  return tf.reduce_mean(tf.square(error))
#Declare the Gradients for Optimization
def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return tape.gradient(loss_value, [model.W])


#Define an instance of the model and train it
n_features = X_train.shape[1]
model=LinReg_TF(n_features)
#Optimize using Stochastic Gradient Descent
optimizer=tf.keras.optimizers.SGD(learning_rate=0.01)

#Train the Model 
print("Initial loss: {:.3f}".format(loss(model, X_train, Y_train)))

Initial loss: 49.799

iterations=100
for i in range(iterations):
   grads = grad(model, X_train, Y_train)
   optimizer.apply_gradients(zip(grads, [model.W]))
   if i % 20 == 0:
     print("Loss at step {:03d}: {:.3f}".format(i, loss(model, X_train, Y_train)))

<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=1>
Loss at step 000: 45.485
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=2>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=3>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=4>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=5>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=6>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=7>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=8>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=9>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=10>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=11>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=12>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=13>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=14>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=15>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=16>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=17>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=18>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=19>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=20>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=21>
Loss at step 020: 10.788
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=22>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=23>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=24>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=25>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=26>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=27>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=28>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=29>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=30>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=31>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=32>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=33>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=34>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=35>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=36>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=37>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=38>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=39>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=40>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=41>
Loss at step 040: 5.547
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=42>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=43>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=44>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=45>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=46>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=47>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=48>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=49>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=50>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=51>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=52>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=53>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=54>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=55>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=56>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=57>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=58>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=59>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=60>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=61>
Loss at step 060: 4.203
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=62>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=63>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=64>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=65>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=66>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=67>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=68>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=69>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=70>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=71>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=72>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=73>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=74>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=75>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=76>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=77>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=78>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=79>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=80>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=81>
Loss at step 080: 3.590
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=82>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=83>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=84>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=85>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=86>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=87>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=88>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=89>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=90>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=91>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=92>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=93>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=94>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=95>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=96>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=97>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=98>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=99>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=100>

print("Final loss: {:.3f}".format(loss(model, X_train, Y_train)))

Final loss: 3.232

print("Beta = {}".format(model.W.numpy()))

Beta = [[1.44044647]
 [1.78671072]
 [1.46901318]
 [3.68254832]]

Notice that we have obtained decent value of the parameters, close to the true values (we can increase the number of iteration to obtain better results). They are marginally inferior to the ones obtained with the analytic solution but remember that we have given to tensorflow only a main relationship, a loss function, and an optimizer, and yet has come up with values. For more complex models where this is no analylic solution at hand this strategy will pay off.

Estimating \(\sigma^2\) along with \(\beta\).

Despite that having estimates for \(\beta\) is enough to come up with an estimator for \(\sigma^2\) using the residuals from the fitted values, in the following we will estimate all 5 parameters all together in a MLE fashion.

#Step 1: Define the class, in this case we modified the call to return the loss
import tensorflow_probability as tfp
class LReg_MLE_TF(tf.keras.Model):
    def __init__(self,n_features=2):  #n_features=2 is just a default value, updatable on call
        super(LReg_MLE_TF, self).__init__()
        #Create some initial weights for the model 
        w_init=tf.random_normal_initializer()
        self.W=tf.Variable(initial_value=w_init(shape=(n_features+1, 1), #add one coef for sigma
                                                  dtype="float32"),trainable=True) 
        
    def call(self, inputs, output):
        predicted = tf.matmul(inputs, self.W[:-1,:])
        gauss = tfp.distributions.Normal(loc=0.0, scale=1.0)
        sigma = 0.0001 + tf.square(self.W[-1]) 
        pi = tf.constant(np.pi)
        log_LL=tf.math.log(0.00001+(1/(tf.sqrt(2*pi)*sigma))*gauss.prob((output-predicted)/sigma)) 
        return -tf.reduce_mean(log_LL)
#Step 2: define the loss and the gradients
#Declare the loss function
def loss_MLE(model, inputs, targets):
  return model(inputs,targets) 
#Declare the Gradients for Optimization
def grad_MLE(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss_MLE(model, inputs, targets)
  return tape.gradient(loss_value, [model.W])

#Step 3: instance of the model and optimizer
#Define an instance of the model and train it
model_MLE=LReg_MLE_TF(n_features)
#Optimize using Adam optimizer
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01)

#Step 4: Train the Model 
print("Initial loss (MLE): {:.3f}".format(loss_MLE(model_MLE, X_train, Y_train)))
iterations=1000
for i in range(iterations):
  grads = grad_MLE(model_MLE, X_train, Y_train)
  optimizer.apply_gradients(zip(grads, [model_MLE.W]))
  if i % 100 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss_MLE(model_MLE, X_train, Y_train)))

print("Final loss: {:.3f}".format(loss_MLE(model_MLE, X_train, Y_train)))
print("Param = {}".format(model_MLE.W.numpy()))

The las entry of Param now contains \(\sqrt{\hat{\sigma}}\), given the transformation that simplifies the initialization of the weights, but notice that when taking squares is much closer to the true parameter \(\sigma^*=1.5\).

print("sigma = {}".format(model_MLE.W.numpy()[-1]**2))