In [11]:

%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
from PIL import Image as PILImage
from IPython.display import Image
import seaborn as sns
from scipy import stats
import pandas as pd
import numpy as np

IMG_WIDTH = 628

sns.set_style('white');
#

Is this a dome or a crater? ¶

In [2]:

Image(filename="./barringer-crater-1.jpg", width=IMG_WIDTH)

Out[2]:

What if we turn this anti-clockwise by $90^0$? ¶

In [4]:

Image(filename="./barringer-crater-90.jpg", width=IMG_WIDTH/2)

Out[4]:

Turn anti-clockwise again by $90^0$? ¶

In [6]:

Image(filename="./barringer-crater-correct.jpg", width=IMG_WIDTH)

Out[6]:

These images are of the Barringer meteor crater in Arizona ¹ . When viewed from the right orientation (the last image above), then it indeed is a crater! We see the dome because our brains make the assumption that the sun usually shines from above, so hills would be light on top and concave areas would be light on the bottom ²

Software Engineering ¶

What's wrong with this code? ¶

In [113]:

def square_root(number):
    return np.sqrt(number)

In [116]:

square_root(20)

Out[116]:

4.47213595499958

Did you assume that the input will always be $\geq 0$? What happens with a bad input? ¶

In [117]:

square_root(-20)

Out[117]:

nan

Use preconditions to test whether input conforms to assumptions ¶

In [118]:

def square_root(number):
    assert number >= 0
    return np.sqrt(number)

In [119]:

square_root(-20)

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-119-ea2acbcfa354> in <module>
----> 1 square_root(-20)

<ipython-input-118-a709e71a3167> in square_root(number)
      1 def square_root(number):
----> 2     assert number >= 0
      3     return np.sqrt(number)

AssertionError:

What about real world programs? ¶

In our OOPSLA paper Enforcing object protocols by combining static and runtime analysis , we show how to build tools that can automatically check assumptions about object interactions

Statistics ¶

Principle of Maximum Entropy ¶

The probability distribution which best represents the current state of information, without any additional assumptions , is the one with the largest entropy.
E. T. Jaynes

Suppose you know the mean $\mu$ and variance $\sigma^2$ of a collection of continuous values, say $\mu = 0$ and $\sigma^2 = 1$. Let's compare the entropies of several distributions with the above mean and variance by controlling the shape parameter $\beta$ of a generalized normal distribution . As we see below, when $\beta = 2$, the generalized normal distribution matches the standard normal distribution ( red curve on the left side plot ), with the highest entropy $1.42$ (plot on the right side). Can we increase the entropy further? We can make the distribution flatter by moving probability mass from the center to the tails - however, if we've to respect the variance constraint, then the distribution with the highest entropy is the standard normal. The takeaway from this illustration is that if all we're willing to assume about a collection of observations is that they've a finite variance, then the Gaussian distribution is the most conservative probability distribution to assign to those measurements. However, with other assumptions, the principle of maximum entropy leads to other distributions ³ .

In [166]:

x = np.arange(-4,4+0.05,0.05)
betas = np.arange(1,5,0.1)

entropies = []

with plt.xkcd():
    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(12, 4))
    for beta in betas:
        alpha = np.sqrt(gamma(1/beta)/gamma(3/beta))
        gdist = stats.gennorm(beta=beta, loc=0, scale=alpha)
        if np.round(beta,1) in [1.,2.,3.,4.]:
            if int(beta) == 2:
                ax0.plot(x, gdist.pdf(x), ls='-', color='r', zorder=10,
                         label=r'$\beta={beta}$'.format(beta=int(beta)))
            else:
                ax0.plot(x, gdist.pdf(x), ls='-.', alpha=0.8, 
                         label=r'$\beta={beta}$'.format(beta=int(beta)))

        e = gdist.entropy()
        entropies.append(e)
    
    ax0.set_xlabel('x')
    ax0.set_ylabel('density')
    
    ax1.plot(betas, entropies)
    ax1.vlines(x=2,ymin=np.min(entropies),ymax=1.42,ls='--')
    
    ax1.set_xlabel(r'$\beta$')
    ax1.set_ylabel('entropy')
    
    handles, labels = ax0.get_legend_handles_labels()
    fig.legend(handles, labels, loc='lower left', bbox_to_anchor= (0.0, 1.01), ncol=2,
            borderaxespad=0, frameon=False)
    plt.tight_layout()
#

Entropy of standard normal ¶

The entropy of the normal distribution is $H = \frac{1}{2} \ln (2\pi e \sigma^2)$ ⁴ . With $\sigma^2 = 1$, $H=1.42$.

Generalized normal distribution ¶

There are 3 parameters: location: $\mu$, scale: $\alpha$ and shape: $\beta~$ ⁵ . \begin{align} \text{Pr}(X=x \vert \mu,\alpha,\beta) &= \frac{\beta}{2~\alpha~\Gamma(1/\beta)}\exp \left \{ - \left( \frac{\vert x - \mu \vert}{\alpha} \right)^\beta \right \} \\ \text{Var(X)} &= \frac{\alpha^2 ~\Gamma \left( 3/\beta \right)}{\Gamma \left( 1/\beta \right)} \end{align}

By setting the variance $\sigma^2 = 1$, we can derive $\alpha = \sqrt{\frac{ \Gamma \left(1 / \beta \right) } {\Gamma \left( 3/ \beta \right) }}$, where $\Gamma$ is the gamma function. We can then use Scipy stats gennorm to compute the entropy for various values of $\beta$.

References ¶

In [ ]:

Assumptions Gaolre!