Category Archives: Uncategorized

Machine Learning Notes/Pointers

Goodfellow, Bengio and Courville excellent, current book from MIT Press including machine learrning fundamentals. Free to read online.

This site links two libraries for deep learning computations:  Both are Python libraries; both build on numpy.

Common features:

matrix operations are coded with Python operations (+. *, etc) and the result is a thing that the library can be commanded to run.  In TensorFlow, coding the thing is called building a computational graph and commanding it to be run is called running the computational graph.


basic use of Theano

calculations can be run in a GPU

graphical or network display of  a neural network architecture.

deep learning demo codes.

TensorFlow (python library) tutorial on deep convolutional neural networks. Provides a small implementation with a GPU verison.

TensorFlow quick start including a complete gradient descent example.

TensorFlow’s CIFAR10 tutorial( a image classification demo small enough to be easily experimented with)

CIFAR10, Alex Krizhevsky’s page

Krizhevsky’s tech report

on github Actively maintained (March 2017), just released v1.0


NVIDIAs instructions:

Use pip, python installer alternative to easy_install Ubuntu python-pip package brought in:

python-all (2.7.11-1) …
Setting up python-all-dev (2.7.11-1) …
Setting up python-pip-whl (8.1.1-2ubuntu0.4) …
Setting up python-pip (8.1.1-2ubuntu0.4) …
Setting up python-setuptools (20.7.0-1) …
Setting up python-wheel

Theano tutorial page

on github Actively maintained (March 2017)

Machine Learning Frameworks

  1. TensorFlow
  2. Theano
  3. Caffe
  4. Caffe2 Caffe2 link NVIDIA link with online labs
  5. Microsoft CNTK CNTK setup:
  8. NVIDIA says “define by run” rather than “define and run”


  1. NVIDIA’s page of Frameworks supporting cuDNN, see bottom for more links “There are several other deep learning frameworks that leverage the Deep Learning SDK, including BidMach, Brainstorm, Kaldi, MatConvNet, MaxDNN, Deeplearning4j, Keras, Lasagne(Theano), Leaf, and more.” Frameworks supporting cuDNN (NVIDIA link)

Machine Learning toolkit(s) higher level than Theano and TensorFlow

Neural network, API on top of Theano and TensorFlow

Wikipedia article Convolutional Neural Network

Early (2005) paper on benefits of GPUs for machine learning

Wikipedia on ReLU (rectifier linear units), their cousins and motivations. f(x) = max(0,x) is found to be a helpful kind of activation function.  Related to softmax f(x) = ln(1 + e^x) and its derivative the logistic function (1/(1+e^-x)).

Wikipedia on Cross Entropy -\sum (p(x) log q(x) ) minimized by q for fixed p is easily related to Kullback-Leibler divergence

Deep Learning in Neural Networks: An Overview by J ̈ urgen Schmidhuber

Terry Tao on Shannon Entropy and its analogies to set functions

Notes on reading MacKay’s book

Neural network models (neural networks) are inspired by but not faithful models of brains.  The are interesting because  brains are interesting,  neural networks can do learning and pattern recognition,  and neural networks are complex and adaptive.

How biological memories differ from computer memories: the former are associative (content addressable), error-tolerant and robust against local failure, distributed.

Artificial neural networks: “parallel distributed computational systems consisting of many interacting simple elements.”

Terms: Architecture. Weights of connections.  Neurons have activities. Activity rules, usually dependent of weights, affect short-term dynamics. Learning rule is for changing the weights, can depend on activities and /or input of target activity values from a teacher; usually over longer time scale than that for the dynamics for changing the activities. Activity and learning rules may be invented, or derived from objective functions. Supervised vs unsupervised.

Chapter 39 covers in precise terms a range of deterministic and stochastic activation functions for the single neuron classifier, training, back-propagations, gradient-descent, batch vs on-line learning, and regularization..suggest to study this.  Going on: Ch. 40 is on single neuron capacity.  Ch 41 is on learning as inference, which is in terms of the Bayesian view of probability. It may clear up mysteries about the learning of probability distributions and a probability distribution as a classifier or predictor.  It explains why log probability is a good error function to minimize.

Bishop’s book Pattern Recognition and Machine Learning begins, after an intro to least square error linear regression, with how it is seen as choosing a normal distribution that maximizes the likelihood that the data given to be interpolated had been drawn from that distribution. This accounts for the minimization of the sum of squares of errors and is further explained in later chapters.

Deep Learning Codes

cuda-convnet2 Alexander Krizhevsky’s git repository

sdc’s fork with updates to build locally

Alex Krizhevsky’s Toronto web site

Deep Learning Links

GPU Enhancement


CUDA: NVIDIAs for their GPUs; C/C++ API

cuDNN: NVIDIA API on top of CUDA for tensor data types and machine learning computations.  NVIDIA’s larger sample code is for the test phase of LeNet-5, the handwritten digit classifier reported in Yann LeCun, et. al. Proc. IEEE Nov. 1998 “Gradient-Based Learning Applied to Document Recognition” The network used was implemented and trained using Caffe.

From sample code: “Training LeNet on MNIST with Caffe” tutorial, located

OpenCL: Open for general GPUs, many platforms of multithreaded CPUs, FPGAs and DSPs

CUDA and OpenCL provide C-like languages for coding kernels (the code that is run in parallel).

Keras is a Python API built on TensorFlow or Theano

TensorFlow is a Python API that can use CUDA.   When it uses CUDA, it does it through NVIDIA’s cuDNN.

Theano is a Python API that can use CUDA (through NVIDIA’s cuDNN like TensorFlow does) and, with “minimal support”, OpenCL


Brett Victor’s Future of Computer Programming

Brought to our attention from Guzdial’s wonderful Computing Education Blog.  Victor just published informative sources for his talk.

In 1972-1973, I tried to teach myself PL/I, programmed in IBM 1130 and 360 Fortran for money from professors, wrote in IBM 1130 assembly language an IBM 1620 cross-assembler, renovated an IBM 1620 by bypassing broken core memory addressing lines and then removing and faking memory parity checking (having studied its circuit diagrams from the individual transistor level up), took a grad. course in Mathematical Logic taught by Ian Filotti at the Courant Institute while auditing Jack Schwartz’s course on compilers, got lectured on Multics by Bernie Greenberg, graduated with a physics major, entered the maw of MIT, 0-ed the Putnam and aced GRE subject tests, etc.


“Don’t try to understand ’em
Just rope, throw, and brand ’em
Soon we’ll be living high and wide.”


Way Out Web Postings

Here I accumulate some perhaps shocking, unbelievable, etc. stuff that might have some truth to it:

  1. from

    “I occasionally need to read mathematics journal articles for programming work that I do, and I find there are often things I simply cannot understand without enough caffeine in my system. The level of abstraction required to understand a mathematical proof is sometimes just too high for me to cope with otherwise.

    Sidebar: I think everyone has a natural “resting” level of abstraction that they are comfortable with; a programmer’s resting abstraction level is on average above that of a non-programmer’s, but professional mathematicians are another level above that. (The worst part is that some of them are additionally normal, friendly people who can socialise and play sports… Not that I’m jealous, of course.)”

Cruising the Sea with NCL

My family and I just finished a 1-week cruise to Bermuda on the NCL Gem.