bandit#

class bandit.BernoulliBandit(p)#

A Bernoulli bandit with a single arm.

Attributes#

pfloat: The probability of the arm returning a reward of 1.

Methods#

pull(): Simulate pulling the arm of the bandit.

Examples#

>>> import numpy as np
>>> np.random.seed(0)
>>> import pybandit as pb
>>> bandit = pb.bandit.BernoulliBandit(0.7)
>>> bandit.pull()
1
>>> [bandit.pull() for _ in range(5)]
[0, 1, 1, 1, 1]

pull()#

Pull the bandit’s arm.

Returns#

int: The reward from pulling the arm, either 0 or 1.

class bandit.GaussianBandit(mean=0.0, std_dev=1.0)#

A simple Gaussian bandit with a single arm.

Attributes#

meanfloat: The mean reward of the Gaussian distribution.
std_devfloat: The standard deviation of the Gaussian distribution.

Examples#

>>> import numpy as np
>>> np.random.seed(0)
>>> import pybandit as pb
>>> bandit = pb.bandit.GaussianBandit(mean=0, std_dev=1)
>>> bandit.pull()  # Random output, example: 1.764052345967664
1.764052345967664
>>> [bandit.pull() for _ in range(5)]  # Pulling the arm 5 times
[0.4001572083672233, 0.9787379841057392, 2.240893199201458, 1.8675579901499675, -0.977277879876411]

pull()#

Pull the arm of the bandit and get a reward.

Returns#

float: A reward sampled from the Gaussian distribution defined by the mean and std_dev.