For those that know very well regressions but not so much about Tensor Flow, it is always a good idea to learn in a familiar enviroment.
Note: I crated this post in Rstudio using reticulate
to run Python code, to do so install this package and then create a python session. However, if you are using an python IDE then there is no need for the following chunk of code.
library(reticulate)
use_python("C:/ProgramData/Anaconda3", required =TRUE) #If you want to use your base anaconda enviroment
py_config()
python: C:/ProgramData/Anaconda3/python.exe
libpython: C:/ProgramData/Anaconda3/python38.dll
pythonhome: C:/ProgramData/Anaconda3
version: 3.8.5 (default, Sep 3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)]
Architecture: 64bit
numpy: C:/ProgramData/Anaconda3/Lib/site-packages/numpy
numpy_version: 1.20.3
NOTE: Python version was forced by RETICULATE_PYTHON
import numpy as np
import pandas as pd
import tensorflow as tf
Here we are going to implement the following data generating process:
\[y_i=\beta_0+\beta_1x_{i1}+\beta_2 x_{i2}+\beta_3 x_{i3}+\varepsilon_i\]
where \(\varepsilon_i\sim N(0,\sigma^2)\), and the true parameters are given by
\[\beta^*=(\beta_0,\beta_1,\beta_2,\beta_3)=(1,2,3,4)\] \[\sigma^*=1.5\]
And the covariates are generated as follow: \[\begin{align} x_{1i} &\sim N(0,1) \\ x_{2i} &\sim Unif[-1,1] \\ x_{3i} &\sim Exp(1) \end{align}\]
def dgp(N):
"""Function to generate a sample of size N for the previous DGP
Parameters:
----------
N: Integer,
Sample Size
Output:
----------
Y: Nx1 Array,
Array containing the dependent variables
X: Nx4 Array,
Array containing the independent variables
"""
#Generate the Error Term
=np.random.normal(0,1.5,size=(N,1))
E#Generate the Covariates
=np.ones((N,1))
X0=np.random.randn(N,1)
X1=np.random.uniform(-1,1,size=(N,1))
X2=np.random.exponential(1, size=(N,1))
X3#Construct Y
=X0+2*X1+3*X2+4*X3+E
Y#Return
return Y, np.hstack((X0,X1,X2,X3))
#Generate a sample of size N=10000
=10000
N145)
np.random.seed(=dgp(N)
Y,Xprint(Y[:4])
[[ 0.53471528]
[ 4.94094422]
[ 5.61397342]
[14.92933068]]
print("Shape of the Dep variable Y: ",Y.shape)
Shape of the Dep variable Y: (10000, 1)
print("Shape of the Ind variable X: ",X.shape)
# Do a Train/Test Split
Shape of the Ind variable X: (10000, 4)
=int(np.floor(0.7*N))
Split=X[:Split,:]; X_test=X[Split:,:]
X_train=Y[:Split].reshape((-1,1)); Y_test=Y[Split:].reshape((-1,1)) Y_train
In a sample with \(N\) individuals, we can estimate \(\beta\) using least squares which solution yields:
\[\hat{\beta}=(X'X)^{-1}X'Y\]
where \(Y\) is a \(N\times 1\) vector of \(\{y_i\}\) stacked, and similarly \(X\) is a \(N\times4\) matrix of stacked covariates for observation \(i\). For implementing the normal equation we only need Numpy
functions.
def basic_ols(X,Y):
"""
Function that implements the normal equation for an OLS estimator
Parameters:
----------
Y: Nx1 Array,
Array containing the dependent variables
X: Nx4 Array,
Array containing the independent variables
Output:
----------
beta: 4x1 array,
OLS estimator
"""
=np.linalg.solve(np.dot(X.T,X),np.dot(X.T,Y))
betareturn beta
#Estimate beta with the normal equation
=basic_ols(X_train,Y_train)
beta_numpyprint("The estimated values for beta are: \n", beta_numpy)
The estimated values for beta are:
[[0.99527478]
[2.01702781]
[3.04739935]
[3.99854234]]
We can compare our explicit implementation with the one a standard library function would produce, for doing so we will use LinearRegression
from the Sklearn
package.
#Import the function and create an instance of the model
from sklearn.linear_model import LinearRegression
=LinearRegression(fit_intercept=True)
lin_reg
#Apply the model to our data
=lin_reg.fit(X_train, Y_train)
reg=np.append(reg.intercept_,reg.coef_[0,1:]).reshape((-1,1))
beta_sklearn
print("Beta estimates using sklearn: \n", beta_sklearn)
Beta estimates using sklearn:
[[0.99527478]
[2.01702781]
[3.04739935]
[3.99854234]]
Now lets obtain estimates for \(\beta\) using tensorflow
, for doing so we have to create a model that maps the dependencies between data and parameters into a computational graph.
# Reset the current tensorflow computational graph
tf.compat.v1.reset_default_graph()145) tf.random.set_seed(
Replicate analytical solution with Tensor Flow
In the following we implement the normal equation solution using tensorflow syntax. Notice that we have done the optimization ourselves and we know that the normal equation provides the maximizing parameters for least squares, so tensorflow is not doing any optimization.
#We need to define our objects into TensorFlow to build "A GRAPH"
=tf.constant(X_train,dtype=tf.float32,name="x")
x=tf.constant(Y_train,dtype=tf.float32,name="y")
y=tf.transpose(x)
xT#The following defines the relationship between data and beta but it is just a relationship stated in TF
=tf.matmul(tf.linalg.inv(tf.matmul(xT,x)),tf.matmul(xT,y))
betaprint("Beta estimates using TensorFlow: \n")
Beta estimates using TensorFlow:
print(beta) tf.
Using TensorFlow 1.0 syntax higlights the role of relationship versus evaluation
with tf.compat.v1.Session() as sess:
=beta.numpy() #This converts the relationship into a numpy array
beta_tfprint("Beta estimates using TensorFlow: \n", beta_tf)
Beta estimates using TensorFlow:
[[0.99528027]
[2.0170271 ]
[3.0473988 ]
[3.998537 ]]
The previous implementation looks more like a hassle cause we have to define a relationship and the evaluate it, instead of just evaluating the normal equation, but this syntaxis is very useful to later save time in the optimization step. However this feature was simplified in TensorFlow 2.0
where execution is done eagerly (like Python normally does).
The following describes the strategy for training the model to obtain parameter estimates:
class
referring to a keras
model, in our case we pass as a input the number of features (covariates).'float64')
tf.keras.backend.set_floatx(#Declare the main relationship/layer from features to predicted Y
class LinReg_TF(tf.keras.Model):
def __init__(self,n_features=2): #n_features=2 is just a default value, updatable on call
super(LinReg_TF, self).__init__()
#Create some initial weights for the model
=tf.random_normal_initializer()
w_initself.W=tf.Variable(initial_value=w_init(shape=(n_features, 1),
="float64"),trainable=True)
dtypedef call(self, inputs):
return tf.matmul(inputs, self.W) #Returns the fitted values for given inputs
#Declare the loss function
def loss(model, inputs, targets):
=model(inputs)-targets
errorreturn tf.reduce_mean(tf.square(error))
#Declare the Gradients for Optimization
def grad(model, inputs, targets):
with tf.GradientTape() as tape:
= loss(model, inputs, targets)
loss_value return tape.gradient(loss_value, [model.W])
#Define an instance of the model and train it
= X_train.shape[1]
n_features =LinReg_TF(n_features)
model#Optimize using Stochastic Gradient Descent
=tf.keras.optimizers.SGD(learning_rate=0.01)
optimizer
#Train the Model
print("Initial loss: {:.3f}".format(loss(model, X_train, Y_train)))
Initial loss: 49.799
=100
iterationsfor i in range(iterations):
= grad(model, X_train, Y_train)
grads zip(grads, [model.W]))
optimizer.apply_gradients(if i % 20 == 0:
print("Loss at step {:03d}: {:.3f}".format(i, loss(model, X_train, Y_train)))
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=1>
Loss at step 000: 45.485
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=2>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=3>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=4>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=5>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=6>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=7>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=8>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=9>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=10>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=11>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=12>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=13>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=14>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=15>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=16>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=17>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=18>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=19>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=20>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=21>
Loss at step 020: 10.788
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=22>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=23>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=24>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=25>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=26>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=27>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=28>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=29>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=30>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=31>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=32>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=33>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=34>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=35>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=36>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=37>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=38>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=39>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=40>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=41>
Loss at step 040: 5.547
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=42>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=43>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=44>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=45>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=46>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=47>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=48>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=49>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=50>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=51>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=52>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=53>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=54>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=55>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=56>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=57>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=58>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=59>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=60>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=61>
Loss at step 060: 4.203
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=62>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=63>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=64>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=65>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=66>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=67>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=68>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=69>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=70>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=71>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=72>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=73>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=74>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=75>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=76>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=77>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=78>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=79>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=80>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=81>
Loss at step 080: 3.590
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=82>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=83>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=84>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=85>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=86>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=87>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=88>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=89>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=90>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=91>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=92>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=93>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=94>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=95>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=96>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=97>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=98>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=99>
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=100>
print("Final loss: {:.3f}".format(loss(model, X_train, Y_train)))
Final loss: 3.232
print("Beta = {}".format(model.W.numpy()))
Beta = [[1.44044647]
[1.78671072]
[1.46901318]
[3.68254832]]
Notice that we have obtained decent value of the parameters, close to the true values (we can increase the number of iteration to obtain better results). They are marginally inferior to the ones obtained with the analytic solution but remember that we have given to tensorflow only a main relationship, a loss function, and an optimizer, and yet has come up with values. For more complex models where this is no analylic solution at hand this strategy will pay off.
Despite that having estimates for \(\beta\) is enough to come up with an estimator for \(\sigma^2\) using the residuals from the fitted values, in the following we will estimate all 5 parameters all together in a MLE fashion.
#Step 1: Define the class, in this case we modified the call to return the loss
import tensorflow_probability as tfp
class LReg_MLE_TF(tf.keras.Model):
def __init__(self,n_features=2): #n_features=2 is just a default value, updatable on call
super(LReg_MLE_TF, self).__init__()
#Create some initial weights for the model
=tf.random_normal_initializer()
w_initself.W=tf.Variable(initial_value=w_init(shape=(n_features+1, 1), #add one coef for sigma
="float32"),trainable=True)
dtype
def call(self, inputs, output):
= tf.matmul(inputs, self.W[:-1,:])
predicted = tfp.distributions.Normal(loc=0.0, scale=1.0)
gauss = 0.0001 + tf.square(self.W[-1])
sigma = tf.constant(np.pi)
pi =tf.math.log(0.00001+(1/(tf.sqrt(2*pi)*sigma))*gauss.prob((output-predicted)/sigma))
log_LLreturn -tf.reduce_mean(log_LL)
#Step 2: define the loss and the gradients
#Declare the loss function
def loss_MLE(model, inputs, targets):
return model(inputs,targets)
#Declare the Gradients for Optimization
def grad_MLE(model, inputs, targets):
with tf.GradientTape() as tape:
= loss_MLE(model, inputs, targets)
loss_value return tape.gradient(loss_value, [model.W])
#Step 3: instance of the model and optimizer
#Define an instance of the model and train it
=LReg_MLE_TF(n_features)
model_MLE#Optimize using Adam optimizer
=tf.keras.optimizers.Adam(learning_rate=0.01)
optimizer
#Step 4: Train the Model
print("Initial loss (MLE): {:.3f}".format(loss_MLE(model_MLE, X_train, Y_train)))
=1000
iterationsfor i in range(iterations):
= grad_MLE(model_MLE, X_train, Y_train)
grads zip(grads, [model_MLE.W]))
optimizer.apply_gradients(if i % 100 == 0:
print("Loss at step {:03d}: {:.3f}".format(i, loss_MLE(model_MLE, X_train, Y_train)))
print("Final loss: {:.3f}".format(loss_MLE(model_MLE, X_train, Y_train)))
print("Param = {}".format(model_MLE.W.numpy()))
The las entry of Param now contains \(\sqrt{\hat{\sigma}}\), given the transformation that simplifies the initialization of the weights, but notice that when taking squares is much closer to the true parameter \(\sigma^*=1.5\).
print("sigma = {}".format(model_MLE.W.numpy()[-1]**2))