Probability Distributions
Frequency Distributions — It summarizes the variation in the observed data(outcomes of an experiment).
Probability Distribution:
- A probability distribution specifies the relative likelihood of all possible outcomes.
- It describes the chance of a future event.
- Describes how outcomes are expected to vary.
- The probability of each outcome is between 0 and 1.
- The sum of all mutually exclusive possible outcomes is 1.
- Useful in making inferences and decisions under uncertainity.
Types of Probability Distributions:
- Discrete Probability Distributions
- Continous Probability Distributions
Probability functions:
Each Probability distribution is given by a probability function that generalizes the probabilities of the outcomes.
We can estimate the probability of a particular outcome(discrete) or the chance that it lies within a particular range of values for any given outcome(continuous). The function is called a Probability Mass function (PMF) for discrete distributions and a Probability Density function (PDF) for continuous distributions. The total value of PMF and PDF over the entire domain is always equal to one.
The PDF gives the probability of a particular outcome whereas the Cumulative Distribution Function gives the probability of seeing an outcome less than or equal to a particular value of the random variable. CDFs are used to check how the probability has added up to a certain point. For example, if P(X = 5) is the probability that the number of heads on flipping a coin is 5 then, P(X <= 5) denotes the cumulative probability of obtaining 1 to 5 heads.
Cumulative distribution functions are also used to calculate p-values as a part of performing hypothesis testing.
- Discrete Probability Distribution:
- A discrete probability distribution counts occurrences that have countable or finite outcomes.
- Discrete distributions contrast with continuous distributions, where outcomes can fall anywhere on a continuum.
- Common examples of discrete distribution include the binomial, Poisson, and Bernoulli distributions.
- These distributions often involve statistical analyses of “counts” or “how many times” an event occurs.
a. Bernoulli’s Distribution:
This distribution is generated when we perform an experiment once and it has only two possible outcomes — success and failure. The trials of this type are called Bernoulli trials, which form the basis for many distributions discussed below. Let p be the probability of success and 1 — p is the probability of failure.
b. Binomial Distribuution:
“Bi” means “two” , so this is about things with two results.
A binomial distribution can be thought of as simply the probability of a SUCCESS or FAILURE outcome in an experiment or survey that is repeated multiple times. The binomial is a type of distribution that has two possible outcomes (the prefix “bi” means two, or twice). For example, a coin toss has only two possible outcomes: heads or tails and taking a test could have two possible outcomes: pass or fail.
The binomial distribution is closely related to the Bernoulli Distribution.The Bernoulli distribution is the Binomial distribution with n=1.
Many instances of binomial distributions can be found in real life. For example, if a new drug is introduced to cure a disease, it either cures the disease (it’s successful) or it doesn’t cure the disease (it’s a failure).
Mathematically, the Binomial Distribution can be written as follows:
c. Poisson’s Distribution:
Poisson Distribution is a probabilistic model used to determine the probability of an event occurring within a certain time or space. The event can be anything, such as the number of visits to a museum in a day, the number of car accidents in a month, or the number of spam messages in your inbox in a week.
Mathematically, the Poisson Distribution can be written as follows:
Poisson Distribution is a useful tool for calculating the probability of an event occurring within a certain time or space. It can be applied to a wide range of real-world situations, such as predicting the number of visits to a museum or the number of car accidents in a given area.
2. Continous Probability Distributions:A probability distribution in which the random variable X can take on any value (is continuous). Because there are infinite values that X could assume, the probability of X taking on any one specific value is zero.
The normal distribution is one example of a continuous distribution. The probability that X falls between two values (a and b) equals the integral (area under the curve) from a to b
a. Normal Distribution:
This is the most commonly discussed distribution and most often found in the real world. Many continuous distributions often reach normal distribution given a large enough sample. This has two parameters namely mean and standard deviation.
This distribution has many interesting properties. The mean has the highest probability and all other values are distributed equally on either side of the mean in a symmetric fashion. The standard normal distribution is a special case where the mean is 0 and the standard deviation of 1.
It also follows the empirical formula that 68% of the values are 1 standard deviation away, 95% percent of them are 2 standard deviations away, and 99.7% are 3 standard deviations away from the mean.