{{errorMessage}}

• {{msg}}
• {{msg}}

## Data Presentation

Sections:
• 0, Chapter 1, Introduction,

### Introduction

This unit continues the topics of data collection and data presentation and illustrates different ways in which you can present data.
You will become familiar with bar charts, pie charts, line graphs and histograms.

After completing this unit you should understand how to construct and

• present data clearly in the form of pictograms and charts.
• use pie charts to illustrate discrete data
• use line graphs to illustrate data
• use stem and leaf plots to illustrate numerical data
• use histograms with equal class intervals to illustrate continuous data.

There are five sections in this Unit, namely:

1. Pictograms and Bar Charts
2. Pie Charts
3. Line Graphs
4. Stem and Leaf Plots
5. Frequency Graphs: Histograms
• 0, Chapter 2, Data Presentation: Bar Charts,

### Data Presentation: Bar Charts

Here is some data for Manchester City, the winners of the Premier League in the 2018/19 season.

The table below gives their scoring record in their $$38$$ matches (20 teams in the Premier League;
each team plays every other team, at home and away).

What does this data show?

Overall Manchester City are good at scoring goals!

How can we illustrate this data to show its meaning?

We can make some progress by analysing the data using a tally table to show the frequency of each
number of goals scored in the 38 games played

• 1, Chapter 2.1, Tally Table,

### Tally Table

In a tally table each data item is shown as a vertical line (I).

After four data items, the fifth is indicated by a line drawn through the previous $$4$$ and a space left before starting the next set of lines.

For example the entry  IIII  III  shows a frequency of $$5 + 3 = 8$$

• 1, Chapter 2.2, Bar Chart,

### Bar Chart

Here is a bar chart to illustrate the frequency of goals scored by Manchester City in the 2018/19 Premier League.

• 1, Chapter 2.3, Horizontal Bar Chart,

### Horizontal Bar Chart

Here is another bar chart but this has been turned round.

• 1, Chapter 2.4, Pictogram: Example,

### Pictogram: Example

There are many other ways to display data; the aim is for someone to look at the display and have a good idea of what it is showing.
Here is a different type of data. The table below gives the number and colour of cars sold by a garage over a month.

How can this data be illustrated?

A pictograms is one possible way of showing this data.

• 1, Chapter 2.5, Pictogram to Illustrate Data,

### Pictogram to Illustrate Data

Here one car picture represents exactly $$1$$ car.

Sometimes it might represent more if the numbers were much larger.

You have to look at the ‘key’ to the pictogram.

• 1, Chapter 2.5, Task 1, Exercise 1,

### Exercise

Exercise 1

The pictogram shows how many suitcases were sold by a store from 2013 to 2019, with one row missing.

1. How many cases were sold in 2014?
$$400$$

2. What is the smallest number of cases sold in a year?
$$250$$

3. What is the greatest number of cases sold in a year?
$$700$$

4. In 2018 a total of 550 cases were sold.
How many suitcase symbols should be there?
$$5\frac{1}{2}$$

Exercise 2

The bar chart below shows the shoe sizes of a group of 15-year-old boys.

1. How many boys are there in the group?
Answer: $$23$$
2. Comment on the shape of the bar chart, saying whether or not this is the shape you would expect.
Answer: We would expect there to be many more boys with shoe sizes around $$8$$ and $$9$$ than  $$5$$ or $$12$$, so the results are surprising.

• 0, Chapter 3, Data Presentation: Pie Charts,

### Data Presentation: Pie Charts

We will use our colour of cars data from the first section as an example.

To construct a pie chart, the angles around the circle must be in proportion to the number of cars of each colour.
It is the AREA that represents the size of the data.
There are 24 cars in total and 360° around the circle, so each car is represented by angle: 360° ÷ 24 = 15°
There are 8 White cars so the angle for White cars is  $$8\times15^\circ = 120^\circ$$.

We calculate the angles for each colour in the same way as in the table.

See the result in the next section.

• 1, Chapter 3.1, Pie Chart for Colour of Cars,

### Pie Chart for Colour of Cars

Using the angles calculated, here is the pie chart

This is a straightforward way of illustrating qualitative data (that is, the colour of cars); we will next look at how to illustrate quantitative data.

• 1, Chapter 3.1, Task 1, Exercise 2,

### Exercise

Exercise

This pie chart, not drawn to scale, shows the Saturday morning activities of a group of 120 children.

1. The sector for soccer is represented by an angle of 150°.
How many children play soccer on Saturday mornings?
$$Number =\frac{150}{360}\times$$
2. Given that 46 children swim on Saturday mornings,
calculate the value of $$x$$.
Angle $$x=\frac{46}{120}\times 360^\circ=138^\circ$$

• 0, Chapter 4, Data Presentation: Line Graphs,

### Data Presentation: Line Graphs

Here is another way to illustrate the Manchester City goals-scored data.

We have joined up the data points but does this make sense?

Not really as you cannot score 21$$\frac{1}{2}$$  goals in a game; it does though show overall trends.

Line graphs are useful though in showing continuous data, for  example temperature or length or weight.

• 1, Chapter 4.1, Illustrating Continuos Data,

### Illustrating Continuos Data

Here is the average monthly temperature at Chichester.

A line graph does make sense here as the data trend can be seen; that is, increasing from Feb to Aug and then decreasing until the end of the year.

• 1, Chapter 4.1, Task 1, Worked Example 1,

### Worked Examples

• #### Worked Example

Samuel recorded the temperature in his shed at 6 am each day for a week. His records are shown on this line graph.

• a)   What was the temperature on Wednesday?
• b)   What was the lowest temperature recorded?
• c)   What was the highest temperature recorded?
• 1, Chapter 4.1, Task 2, Worked Example 2,

### Worked Examples

•

#### Worked Example

This line graph shows how the height of a sunflower plant changed in the weeks after it was planted.

1. What was the height of the plant when planted?
8 cm
2. How much did the plant grow in the first week?
22 cm
3. What is the greatest height reached so far?
84 cm
4. How long did it take to go from 54 to 78 cm?
3 weeks

• 0, Chapter 5, Stem and Leaf Plots,

### Stem and Leaf Plots

Here is another very simple way which quickly gives an overall view of the general characteristics of the data.
This is called a stem and leaf plot and here how it works.

• #### Worked Example

The marks gained out of $$50$$ by $$15$$ pupils in a Biology test are as given below: $$27, 36, 24, 17, 35, 18, 23, 25, 34, 25, 41, 18, 22, 24, 42$$
The stem and leaf plot uses  'tens' as the stem and the 'units' as the leaf.

• 1, Chapter 5.1, Times for One Mile,

### Times for One Mile

Use the slider to explore worked examples.

• #### Worked Example

Here are the times in minutes for a group of adults to walk or run one mile:

Illustrate the data in a stem and leaf plot; what can you say about the data?

• 1, Chapter 5.1, Task 1, Exercise 3,

### Exercise

Exercise

Blood samples were taken from $$40$$ blood donors and the lead concentration (mg per $$100\;ml$$) recorded.

1. Construct a stem and leaf plot to represent the data.
2. What are the values of the range and the median (middle) value?

a.

b.

$$Range = 65 – 17 = 48\;and\;median = 30$$

• 0, Chapter 6, Frequency Graphs: Histograms,

### Frequency Graphs: Histograms

For continuous data, when any value over a range of values is possible, a frequency graph like the one below can be used to illustrate the data.

This is called a histogram, and is characterised by having a continuous scale along the horizontal axis.
Care must be taken about the end points.

For example, the interval (in mins) would normally be $$30 \leq \times \lt 35$$,  so that a time of $$35\;minutes$$ would be in the second interval.

• 1, Chapter 6.1, Using the Histogram,

### Using the Histogram

• #### Worked Example

How many people completed the Fun Run in

• a)   between $$40$$ and $$45\;minutes$$
• b)   less than $$40\;minutes$$
• c)   less than one hour?
• 1, Chapter 6.1, Task 1, Worked Example 3,

### Worked Examples

• #### Worked Example

A group of students measured the reaction times of $$50$$ other students.
The times are given below, correct to the nearest hundredth of a second.

• 1, Chapter 6.1, Task 2, Exercise 4,

### Exercise

Exercise

This histogram shows the weights of learners in a school.

1. How many had a weight greater than $$70\;kg$$?
$$5$$

2. How many had a weight between $$50$$ and $$65\;kg$$?
$$55$$

3. How many had a weight less than $$50\;kg$$?
$$15$$

4. How many learners were there in the school?
$$81$$

• 0, Chapter 7, Summary,

### Summary

Interactive Exercises:
• Data Presentation Interactive Exercises, https://www.cimt.org.uk/sif/datascience/ds3/interactive.htm
• Pictograms and Bar Charts, https://www.cimt.org.uk/sif/datascience/ds3/interactive/s1.html
• Pie Charts, https://www.cimt.org.uk/sif/datascience/ds3/interactive/s2.html
• Line Graphs , https://www.cimt.org.uk/sif/datascience/ds3/interactive/s3.html
• Stem and Leaf Plots, https://www.cimt.org.uk/sif/datascience/ds3/interactive/s4.html
• Frequency Graphs: Histograms, https://www.cimt.org.uk/sif/datascience/ds3/interactive/s5.html
File Attachments:

## Data Collection

Sections:
• 0, Chapter 1, Introduction,

### Introduction

Understanding and analysing data is critical in all sciences. In this unit we focus on how to collect  data in ways that will aid your analysis

After completing this unit you should be able to

• understand how to produce unbiased questions
• appreciate the need for random samples
• use and apply tables such as distance charts.

There are three sections in this Unit, namely:

1. Questionnaires and Surveys
2. Sampling
3. Charts and Tables
• 0, Chapter 2, Questionnaires and Surveys,

### Questionnaires and Surveys

Collecting data is often done by questionaires and surveys. When deisgning a questionaire to collect data there are several key aspects to consider

• What is the purpose of the questionnaire ?
• Questions must be non biased
• Questions must be clear and unambiguous
• Questions can result in qualative or quantitive data
• Always pilot any questionnaire
• 1, Chapter 2.1, Biased Questions,

### Biased Questions

When constructing questions it is important to select ones that are not biased.

For example , are the questions on this questionnaire biased?

• Are you concerned about the environment?
• Are you concerned about the level of pollution caused by vehicles?
• Do you think the health of young children is at risk due to exhaust fumes from vehicles?
• Is there too much traffic congestion in towns?
• Is public transport under-used?
• Do you think vehicles should be banned from main shopping streets?

Yes, they are biased as they lead you to answer ‘Yes’ to the last question!

• 1, Chapter 2.2, Poor Questionnaire?,

### Poor Questionnaire?

Why is this a poor questionnaire?

Yes - There are many ways it could be improved; for example:

• Major sports such as Football left off
• Participants might have two sports as equal favourites
• It might be more interesting to ask for their views of each sport using a scale of interest
• 1, Chapter 2.3, Simple Likert Scales for Responses,

### Simple Likert Scales for Responses

You can use what is termed a Likert scale to get responses to questions: see below

You could have more or less options but note that for 4 options, there is no middle value.

You often see, for example at Airports, coloured button responses to reflect the service with the results automatically saved and analysed.

• 1, Chapter 2.3, Task 1, Exercise 1,

### Exercise

Louis thinks that learners who walk to school are more likely to be late than those who do not walk to school.
To test if his view is correct, he carries out a survey of 100 learners from Year 7 for 5 consecutive Tuesdays.
Here are his results.

Exercise 1

Do the results support his view

Exercise 2

Suggest ways in which Louis could improve his survey.

• 0, Chapter 3, Sampling,

### Sampling

When conducting a survey it is often impossible to ask every individual who might be concerned or involved.
For example, for a political opinion poll it is only possible to ask a sample of the population how they would vote.

#### Remember

Population is used for any group for which information is required.

For example, the following could be populations:

• Plymouth Argyle Football Club supporters,
• Laptops produced in a factory,
• All students in your school or college

Conclusions are reached by looking at samples taken from a population.

There are many methods for selecting a sample from a population but we will look carefully at 3 of these here:

• Random
• Stratified
• Systematic
• 1, Chapter 3.1, Random Sample,

### Random Sample

• The sample is formed by selecting a number of members of the population at random.
• It is important to make sure that each member of the population is equally likely to be selected.
• Tables of random numbers can be used to help this process but more mundane methods, such as using numbers from a telephone directory
or even choosing a number from a hat of numbers, can be used.
• Scientific calculators also provide you with random numbers.

Lets look at a table of random numbers.

• 1, Chapter 3.2, Random Numbers,

### Random Numbers

Here is an example of a table of random numberds. In the next section we show in the worked example how such tables are used

• 1, Chapter 3.2, Task 1, Worked Example 1,

### Worked Example

• #### Worked Example

Find a random sample of size $$5$$ for a group of $$60$$ learners.

• 1, Chapter 3.3, Stratified Numbers,

### Stratified Numbers

This is an important method of sampling, particularly when the  population under study is split into a number of distinct groups.
Random samples are taken from each group so that the ratio of the sizes of the sample is the same as the ratio of  the number of members of the groups in the population.

For example, if a population contains $$1000$$ women and $$500$$ men, a stratified sample of total size $$75$$ would contain $$50$$ women and $$25$$ men.

• #### Worked Example

A population contains $$2000$$ women and $$1500$$ men.  We require a stratified sample of total size $$70$$. How many women and men would be needed?

• 1, Chapter 3.4, Systematic Sample,

### Systematic Sample

This type of sample is formed by taking members of the population at regular intervals.

For example, by selecting every $$5^{th}$$ or every $$10^{th}$$ or every $$20^{th}$$ member of the population, depending on the population number and sample size.

Use the slider to explore worked examples.

• #### Worked Example

There are $$12$$ teachers in this small primary school. A sample of size $$3$$ is to be selected using systematic sampling.

• #### Worked Example

In reality with only $$12$$ teachers to sample, it would be easier to survey the complete population; we call this a CENSUS.
As an example though, we will use the same data to obtain a random sample of size 3.

Find a random sample of size 3 for this population.

• 1, Chapter 3.4, Task 1, Exercise 2,

### Exercise

Exercise 1

What is the difference between a census and a sample?

Exercise 2

What are the advantages of taking a sample?

Exercise 3

This table shows the number of learners in each school year.
How many learners should be selected from each year to create a stratified sample of size 80?

Exercise 4

For these teachers, use this following list of random numbers to find a random sample of size 4.

• 0, Chapter 4, Charts and Tables,

### Charts and Tables

Charts or Tables are often used as an effective way to represent data. We demonstrate this idea with two examples. Below if the first example.

#### Example

The chart can be used to show the mileage between some Scottish towns and cities.

What is the distances in miles between:

1. Fort William and Perth?

2. Edinburgh and Stranraer?

3. Which places are furthest apart?

• 1, Chapter 4, Task 1, Example 1,

### Example

In a school 37 learners took exams in both Mathematics and Physics. Their results are given in this table:

1. How many got the same grade in both subjects

2. How many obtained a higher grade in Physics than in Mathematics?

3. Which was the most common grade in Physics?
• 1, Chapter 4, Task 2, Exercise 3,

### Exercise

Exercise 1

The table gives the distances, in kilometres, between some English towns and cities.

Nicole travels from Birmingham to Leeds and then to Manchester.

1. How far does she travel?

2. By how many km would the distance she travels be reduced if she went from Birmingham to Manchester and then to Leeds?

Exercise 2

The table shows the favourite sports selected by a  sample of learners each year in a secondary school.
In each year, each learner chose just one sport.

1. How many chose basketball in Year 8?

2. How many more chose football in Year 7 than in Year 10?

3. In which year was 'track and field’ the most popular sport?

• 0, Chapter 5, Summary,

### Summary

#### Population

This is the variable (e.g. age, height, weight, etc.) that you are investigating

#### Sampling

A number of members taken from a population; you take a sample from the population to estimate a particular characteristic of the population

#### Random sample

Here every member of the population has an equal chance of being in the sample

#### Systematic sample

Here the population is numbered, e.g. 1 → 100, and you take, for example, every 10th number (e.g. 1, 11, . . . , 91) to give a sample of, in this
instance, size 10

#### Stratified sample

Here the population has distinct groups; you need to ensure that all groups are fairly represented in the sample

#### Distance chart

A matrix of values which shows distances between specified places

#### Questionnaire

Should be straightforward with no biased questions

Interactive Exercises:
• Data Collection Interactive Exercises, https://www.cimt.org.uk/sif/datascience/ds2/interactive.htm
• Questionnaires and Surveys, https://www.cimt.org.uk/sif/datascience/ds2/interactive/s1.html
• Sampling , https://www.cimt.org.uk/sif/datascience/ds2/interactive/s2.html
• Charts and Tables, https://www.cimt.org.uk/sif/datascience/ds2/interactive/s3.html
File Attachments:

## Probability of One Event

Sections:
• 0, Chapter 1, Introduction,

### Introduction

This is a really important unit, the first of two that deal with probability. After completing this unit
you should

• understand how to classify probabilities using words such as CERTAIN, LIKELY,
UNLIKELY, IMPOSSIBLE and illustrate probabilities on a probability line
• be able to calculate simple probabilities
• be able to make estimates of probabilities using relative frequency
• be confident in determining the probability of an outcome when all possibilities are equally
likely to occur.

The unit is divided into the following four sections

1. Probabilities
2. Straightforward Probability
3. Finding Probabilities Using Relative Frequency
4. Determining Probabilities

• 0, Chapter 2, Probabilities,

### Probabilities

Example 1

Here are some examples of simple probabilities:

a. When you roll a fair die, which number are you most likely to get?

b. If you rolled a dice $$600$$ times how many sixes would you expect to get?

c. Would you expect to get the same number of ones?

Example 2

We talk about probabilities all the time in daily life. Here are some examples

Use one of the following to describe each one of the statements a. to d..

• Certain
• Very likely
• Likely
• Unlikely
• Very Unlikely
• Impossible

a. It will snow in Barbados tomorrow.

b. It will rain in London on Saturday.

c. You win a car in a competition tomorrow.

d. You will be late for school tomorrow.

• 0, Chapter 3, Straightforward Probabilities,

### Straightforward Probabilities

#### Remember

Probabilities are given values between $$0$$ and $$1$$.
A probability of $$0$$ means that the event is impossible, while a probability of $$1$$ means that it is certain to happen.
The closer the probability is to $$1$$, the more likely the event is to happen whilst the closer the probability is to $$0$$, the less likely it is to happen.

• 1, Chapter 3, Task 1, Worked Examples 1,

### Worked Examples

Use the slider to explore worked examples.

• #### Worked Example

The probability that it will rain tomorrow is $$\frac{2}{3}$$.
What is the probability that it will not rain tomorrow?

• #### Worked Example

The probability that Newcastle United lose their next football match is $$\frac{7}{10}$$.
What is the probability that they win their next game?

• #### Worked Example

When you toss a fair coin, what is the probability that it lands heads up?

• 0, Chapter 4, Finding Probabilities Using Relative Frequency,

### Finding Probabilities Using Relative Frequency

Sometimes it is possible to calculate values for the probability of an event such as tossing a coin and getting a head, by symmetry arguments.

For other events probabilities can be estimated by using results of experiments or records of past events.

Use the slider to explore worked examples.

• #### Worked Example

In a town in Yorkshire there was rain on 18 days in September 2019. Use this information to estimate the probability that it rained there on a particular day in September 2019.

• #### Worked Example

Henry carries out an experiment with a piece of buttered toast. He drops it $$50$$ times and finds that $$35$$ times it lands butter side down.
Use these results to estimate the probability that a piece of toast lands butter side up when dropped.

• 0, Chapter 5, Determining Probabilities,

### Determining Probabilities

When the outcomes of an event are all equally likely, then probabilities can be found by considering all the possible outcomes.

For example, when you toss a fair coin there are two possible outcomes, either heads or tails.

so $$p\;(head)=\frac{1}{2}$$   and   $$p\;(tail)=\frac{1}{2}$$

#### Remember

The probability of an outcome is given by

$$\frac{number\;of\;ways\;of\;obtaining\;an\;outcome}{number\;of\;possible\;outcomes}$$

provided all outcomes are equally likely.

• 1, Chapter 5, Task 1, Worked Examples 2,

### Worked Examples

Use the slider to explore worked examples.

• #### Worked Example

A card is taken at random from a full pack of $$52$$ playing cards with no jokers. What is the probability that the card is
a)   an ace?
b)   a heart
c)   black

• #### Worked Example

In a class of $$30$$ students, $$16$$ are girls, $$4$$ wear glasses and $$3$$ are left- handed. A student is chosen at random from the class. What is the probability that this student:
a)   is a girl
b)   is right-handed
c)   wear glasses ?

• 0, Chapter 6, Summary,

### Summary

#### Outcomes

events that can occur after an experiment (for example, obtaining a '$$6$$' when throwing a die)

#### Sample space

for an experiment is the set of all possible outcomes (for example, {$$1, 2, 3, 4, 5, 6$$} when throwing a die}

#### Event

subject of the sample space (for example, obtaining even numbers when throwing a die)

#### Relative frequency

the frequency of an event divided by the total frequency; this is used as an estimate for the probabilities of that event

#### Independent events

when the result of one event happening does not affect the probability of the other

#### Fair or unbiased

die (coin, spinner, etc.) - every face (side) has an equal chance

#### Biased

die (coin, spinner, etc.) - all outcomes are not equally likely to occur. For example, a weighted die.

(Note that the word 'dice' is the plural of 'die', but is often used as a singular.)

• The probability of any outcome, $$p$$, must satisfy $$0 \leq p \leq 1$$.
• The sum of the probabilities of all outcomes must equal $$1$$.
• Probabilities can be illustrated on a probability line:
• For finding probabilities by experiment:
$$probability\;of\;event=\frac{frequency\;of\;event}{total\;frequency}$$
• For equally likely outcomes:
$$probability\;of\;particular\;event=\frac{number\;of\;ways\;of\;obtaining\;events}{total\;no.\;of\;possible\;outcome}$$
For example, when throwing a fair die,
$$p\left(6\right)=\frac{1}{6}, p\left(prime\;number\right)=p\left(2,3\;or\;5\right) = \frac{3}{6} =\frac{1}{2}$$
Interactive Exercises:
• Probability of One Event Interactive Exercises, https://www.cimt.org.uk/sif/datascience/ds6/interactive.htm
• Probabilities, https://www.cimt.org.uk/sif/datascience/ds6/interactive/s1.html
• Probability of an Event Estimating, https://www.cimt.org.uk/sif/datascience/ds6/interactive/s2.html
• Probabilities Using Relative Frequency, https://www.cimt.org.uk/sif/datascience/ds6/interactive/s3.html
• Probabilities Based on Equally Likely Events, https://www.cimt.org.uk/sif/datascience/ds6/interactive/s4.html
File Attachments:

## Probability of Two or More Events

Sections:
• 0, Chapter 1, Introduction,

### Introduction

The work in this unit continues the development of probability. After completing this unit you should

• be able to identify possible outcomes of two (or more) events by listing the outcomes
• be confident in finding probabilities when all outcomes are equally likely
• be able to use tree diagrams to determine probabilities of particular events.

This unit on probability is divided into the following three sections:

1. Outcome of Two Events
2. Probability of Two Events
3. Use of Tree Diagrams
• 0, Chapter 2, Outcome of Two Events,

### Outcome of Two Events

When dealing with probabilities for two events, it is important to be able to identify all the possible outcomes.
Here are examples to show the methods that can be used.

Method A:  Systematic Listing

For a special meal, customers at a pizza parlour can choose a pizza with one of the following toppings

• Ham
• Mushroom
• Salami
• Pepperoni
• Tuna

and a drink from the following list.

• Cola
• Diet Cola
• Orange

How many possible combinations of toppings and drinks are there?

Solution

$$HC\;HD\;HO\;MC\;MD\;MO,…..$$ $$5\times 3 = 15$$ combinations

• 1, Chapter 2.1, Method B:  2-way Tables,

### Method B:  2-way Tables

A die and a coin are tossed.  List all the possible outcomes.

Solution

The coin can land heads (denoted by $$H$$) or tails ($$T$$), whilst the die can show $$1, 2, 3, 4, 5\;or\;6$$.
So for heads on the coin, the possible outcomes are

$$H1, H2, H3, H4, H5$$ and $$H6$$

whilst for tails, they are

$$T1, T2, T3, T4, T5$$ and $$T6$$.

The listing method used here can be displayed in a 2-way table.

• 1, Chapter 2.2, Method C:  Tree Diagrams,

### Method C:  Tree Diagrams

A coin is tossed twice.  List all the possible outcomes.

Solution

You can use a tree diagram to represent this solution.

Note that ' TH ' is not the same as ‘ HT’  because of the order of events.

• 0, Chapter 3, Probability of Two Events,

### Probability of Two Events

When two events take place, and every outcome is equally likely to happen,
the probability of a particular combined outcome can be readily found from the formula

#### Remember

$$probability=\frac{number\;of\;succesful\;outcomes}{total\;number\;of\;outcomes}$$

• 1, Chapter 3, Task 1, Worked Example 1,

### Worked Example

• #### Worked Example

A spinner that forms part of a children's game can point to one of four regions, $$A, B, C$$ or $$D$$, when spun.
What is the probability that when two children spin the spinner, it points to the same letter?

• 0, Chapter 4, Use of Tree Diagrams,

### Use of Tree Diagrams

Tree diagrams can be used to find the probabilities for two events, when the outcomes are not necessarily equally likely.

• #### Worked Example

If the probability that it rains on any day is $$\frac{1}{5}$$ , draw a tree diagram and

find the probability

1. that it rains on two consecutive days,
2. that it rains on only one of two consecutive days.
• #### Worked Example

The probability that Jenny is late for school is $$0.3$$.  Find the probability that on two consecutive days she is:

1. never late
2. late only once
• 0, Chapter 5, Summary,

### Summary

• For finding probabilities by experiment:

$$probability\;of\;event=\frac{frequency\;of\;event}{total\;frequency}$$

• For equally likely outcomes:

$$probability\;of\;particular\;event=\frac{number\;of\;ways\;of\;obtaining\;event}{total\;no.\;of\;possible\;outcomes}$$

For example, when throwing a fair die,

$$p\left(6\right)=\frac{1}{6}, p\left(prime\;number\right)=p\left(2,3\;or\;5\right)=\frac{3}{6}=\frac{1}{2}$$

#### Probability

The likelihood of the occurrence of an event.

#### Tree diagram

A diagram which can be helpful in illustrating possible outcomes of an experiment.
Probabilities are assigned to the branches when one or more events are being considered.
The probability of any outcome is the product of all possibilities along the relevant branches.
For example, throwing a die and noting whether the score is 6 on each occasion, as illustrated below.

#### Replacement

Replacing an item so that the probabilities remain unchanged for each experiment.
For example, when two balls are taken from a bag in turn, with the first ball being put back
(replaced) in the bag before the second is taken out.

#### Non-replacement

In the example above, the first ball is NOT put back in the bag before the second is taken out.

Interactive Exercises:
• Probability of Two or More Events Interactive Exercises, https://www.cimt.org.uk/sif/datascience/ds7/interactive.htm
• Outcome of Two Events, https://www.cimt.org.uk/sif/datascience/ds7/interactive/s1.html
• Probability of Two Events, https://www.cimt.org.uk/sif/datascience/ds7/interactive/s2.html
• Use of Tree Diagrams, https://www.cimt.org.uk/sif/datascience/ds7/interactive/s3.html
File Attachments:

## Measures of Central Tendency

Sections:
• 0, Chapter 1, Introduction,

### Introduction

After completing this unit you should be able to

• calculate the mean, mode and median for a set of discrete data
• use tally charts and tables to calculate the mean of a data set
• undertake calculations with the mean
• calculate or estimate mean, mode and median from a set of grouped data.

There are three sections in this unit, namely:

1. Mean, Median, Mode and Range
2. Finding the Mean from Tables and Tally Charts
3. Calculations with the Mean
• 0, Chapter 2, Mean, Median, Mode and Range,

### Mean, Median, Mode and Range

Mean, Median and Mode are all different ways of describing the  “average” for a set of data.

To find the Mean, add up all the numbers and divide by the number of numbers.
This gives what is often called the average.

To find the Median, place all the numbers in order and select the middle one.
This gives another measure for the average.

To find the Mode, find the value that occurs most often.
It is not quite a measure of central tendency but in some contexts it is an important measure.

We can summarise these measures of central tendency as below.

#### Remember

$$Mean=\frac{sum\;of\;the\;values}{number\;of\;values}$$

$$Median = middle\;value\left(when\;the\;data\;is\;arranged\;in\;order\right)$$

$$where\;there\;are\;two\;central\;values,\;the\;median\;is\;their\; mean$$

$$Mode = most\;common\;value$$

We define the range as the difference between the largest and smallest values; this is, in fact,
a simple measure of the variation in the data as is seen in the unit “Measures of Variation”.

• 1, Chapter 2.1, Mean,

### Mean

The marks below were obtained by students on a maths test that was marked out of $$20$$:
$$9, 12, 10, 15, 8, 14, 19, 12, 11, 7, 17, 12, 10, 13, 11$$.

To find the mean of this set of data we add up all the marks and divide by the total number of students.

There were 15 students, so this gives:

$$9 +12 +10 +15 + 8 + 14 + 19 + 12 + 11 + 7 + 17 + 12 + 10 + 13 + 11 = 180$$

$$180 \div 15 = 12$$

So the mean of this set of data is $$12$$.

• 1, Chapter 2.1, Task 1, Worked Example 1,

### Worked Example

Use the slider to explore worked examples.

• #### Worked Example

Find the median and mode of the set of marks

$$7, 8, 9, 10, 10, 11, 11, 12, 12, 12, 13, 14, 15, 17, 19$$

• #### Worked Example

In a singing contest, the scores awarded by the eight judges were:

$$5.9, 6.7, 6.8, 6.5, 6.7, 8.2, 6.1, 6.3$$

1. Determine the median value of the eight scores.
2. If the highest and lowest scores are omitted, does the median value change?
• #### Worked Example

Evaluate: $$\int_{0}^{6}{(x^{2} - 2x -8)dx}$$

• 0, Chapter 3, Data Frequency,

### Data Frequency

Data is often given in terms of frequency; for example, here is the data from pupils in a class
giving the number of people living in their home:

Number of People
Living in Home
Frequency
$$2$$ $$3$$
$$3$$ $$9$$
$$4$$ $$10$$
$$5$$ $$2$$
$$6$$ $$3$$
$$7$$ $$1$$
$$8$$ $$1$$
$$9$$ $$0$$
$$10$$ $$1$$

The frequency can refer to the number of people in that category (as it does here) or the number of times an occurrence happens.

• 1, Chapter 3, Task 1, Worked Example 2,

### Worked Example

Use the slider to explore worked examples.

• #### Worked Example

During one week of August, Jamal recorded the maximum daily temperatures in two cities.
Here are his results:

Sunday Monday Tuesday Wednesday Thursday Friday Saturday
London 22 25 26 24 24 29 28
Edinburgh 22 22 23 24 25 24 26
1. What was the range of the temperatures in
1. London 2. Edinburg
2. What was the mean temperature in
1. London 2. Edinburg
3. What are the main differences in the temperature between London and Edinburgh?
• #### Worked Example

We need to find the mean, mode and median for this data in the frequency table below.

• #### Worked Example

A manager keeps a record of the number of work calls she makes each day on her mobile phone.
Calculate the mean, median and mode for the number of calls per day.

• 1, Chapter 3, Task 2, Worked Example 3,

### Worked Example

Sometimes we are told the mean and asked to work out other information. Here is an example

Use the slider to explore worked examples.

• #### Worked Example

Rohan's mean score in three cricket matches was $$55$$ runs.

a. How many runs did he score altogether?

After four matches his mean score was $$61$$ runs.

b. How many runs did he score in the fourth match?

• #### Worked Example

The mean height of a class of $$28$$ students is $$162\;cm$$.  A new student of height $$149\;cm$$ joins the class.

What is the mean height of the class now?

• 0, Chapter 4, Summary,

### Summary

#### Mean

The arithmetic mean, obtained by adding together all the numbers in a set of data
and dividing by how many numbers there are in the set.

For example, the mean of a set of numbers, $$x_{1},x_{2},...,x_{n}$$ is calculated from the formula
$$mean=\frac{x_{1},x_{2},...,x_{n}}{n}$$

So, for the data set {$$5, 7, 2, 5, 1, 2, 3, 5, 6$$},
the mean value $$=\frac{\left(5+7+2+5+1+2+3+5+6\right)}{9}=\frac{36}{9}=4$$

#### Median

The median is the middle value when the values are listed in numerical order.

For example, for the data set {$$5, 7, 2, 5, 1, 2, 3, 5, 6$$}, the numerical order is

Note that if there are an even number of values in the data set, the median is the mean of the middle two values.

For example, for the data set {$$5, 7, 2, 5, 1, 2, 3, 5$$}, the numerical order is

and so the $$median = \frac{\left(3+5\right)}{2}=4$$

#### Mode

The mode is the numerical value that has the highest frequency in a set of data, that is, the value that occurs most often.

For example, for the data set {$$5, 7, 2, 5, 1, 2, 3, 5, 6$$}, you can write the values as

The mode is $$5$$, as it has the highest frequency $$\left(3\right)$$.

#### Range

The range is the difference between the highest and the lowest values in the data set.

For example, for the data set {$$5, 7, 2, 5, 1, 2, 3, 5, 6$$},
the range is $$7 – 1 = 6$$

Interactive Exercises:
• Interactive Exercises - Measures of Central Tendency, https://www.cimt.org.uk/sif/datascience/ds4/interactive.htm
• Mean, Median, Mode and Range, https://www.cimt.org.uk/sif/datascience/ds4/interactive/s1.html
• Finding the Mean from Tables and Tally Charts, https://www.cimt.org.uk/sif/datascience/ds4/interactive/s2.html
• Calculations with the Mean, https://www.cimt.org.uk/sif/datascience/ds4/interactive/s3.html
File Attachments:

## Extending Probability

Sections:
• 0, Chapter 1, Introduction,

### Introduction

This is an extension to the earlier units on probability. After completing this unit you should be able to

• find the probability of independent events through multiplication
• understand the concept of mutually exclusive events
• use tree diagrams to calculate conditional probabilities
• use Venn diagrams to find probabilities.

This unit of work on probability is divided into the following four sections:

1. Multiplication for Independent Events
2. Mutually Exclusive Events
3. Tree Diagrams and Conditional Probability
4. Using Venn Diagrams to Find Probabilities

• 0, Chapter 2, Multiplication for Independent Events,

### Multiplication for Independent Events

#### Remember

Two events are independent if one event happening does not affect the probability
of the other event.

In this case the probability of two events $$A$$ and $$B$$ occurring is given by

 $$p \left(A\;and\;B\right) = p \left(A\right) \times p \left(B\right)$$

• 1, Chapter 2, Task 1, Worked Example 1,

### Worked Example

• #### Worked Example

A fair die is rolled twice. Event $$A$$ is the ‘first roll shows a six’ and event $$B$$ is ‘the second roll shows a six’.
Are events $$A$$ and $$B$$ independent?

• #### Worked Example

A spinner in game has $$3$$ sections of equal size that are coloured red ($$R$$), orange ($$O$$) and purple ($$P$$).

When the spinner has been spun, are these events independent?

• 0, Chapter 3, Mutually Exclusive Events,

### Mutually Exclusive Events

#### Remember

If two events cannot happen or take place at the same time, then they are called mutually exclusive events.

For example, when tossing a single coin the events 'heads' and 'tails' are mutually exclusive because they cannot both be obtained at the same time.

If $$A$$ and $$B$$ are mutually exclusive events then the probability of obtaining $$A$$ or $$B$$ is given by:

$$p \left(A\;or\;B\right) = p \left(A\right) + p \left(B\right)$$

• 1, Chapter 3, Task 1, Worked Example 2,

### Worked Example

Use the slider to explore worked examples.

• #### Worked Example

State whether or not the pairs of events described below are mutually exclusive.

a)   A: A die is rolled and shows a 6.
B: A die is rolled and shows an odd number.

b)   A: Selecting a child with blue eyes from a class.
B: Selecting a left-handed person from a class.

• #### Worked Example

When Andrew buys a can of drink the probabilities of selecting particular brands are given in the table opposite.

Find the probabilities that he selects:
a)   Cola or Lemonade (C or L)
b)   Fizzo or Lemonade (F or L)
c)   None of the drinks listed above.

• 0, Chapter 4, Tree Diagrams and Conditional Probability,

### Tree Diagrams and Conditional Probability

The probabilities of certain events may change as a result of earlier events.
For example, if it rains today, the probability that it rains tomorrow may be greater than if it was sunny today.

When using a tree diagram (see Probability of Two or more Events if you need a reminder on Tree Diagrams) check that the probabilities do not change as a result of the first event.

When probabilities depend on previous events they are called conditional probabilities.
This concept will become clearer when you look at the  worked example following.

#### Top Tip

• 1, Chapter 4, Task 1, Worked Example 3,

### Worked Example

#### Top Tip

Use the slider to explore worked examples.

• #### Worked Example

A bag contains $$7$$ red balls (R) and $$3$$ green balls (G).  Two balls are taken from the bag.

Find the probability that they are:
a)   both red
b)   both green
c)   one red and one green

• #### Worked Example

Weather experts estimate the probability of rain any day as $$0.6$$ if it rained the previous day and $$0.3$$ if it did not rain the previous day.

Find the probability that a dry day is followed by
a)   two more dry days,
b)   two wet days
c)   one wet day and one dry day in either order

• 0, Chapter 5, Using Venn Diagrams to Find Probabilities,

### Using Venn Diagrams to Find Probabilities

Venn diagrams can be used to help find probabilities when events are not mutually exclusive.

This Venn diagram illustrates the situation for a pack of playing cards where the set A is the aces and the set H is the hearts.
Note that are $$3 + 1= 4$$ Aces in total and $$1 + 12 = 13$$ Hearts in total.

Also note that the total of the entries is $$52$$, and the complete set is denoted by $$\varepsilon$$ and, for example:
$$p \left(Ace\;of\;Hearts\right) = \frac{1}{52}$$
$$p \left(Ace\;or\;Heart\right) = \frac{4}{52} + \frac{13}{52} - \frac{1}{52} = \frac{16}{52}$$

Remember there is one card which is both an ace and a heart, so we should not count that twice

• 1, Chapter 5, Task 1, Worked Example 4,

### Worked Example

Use the slider to explore worked examples.

• #### Worked Example

In a group of $$40$$ students, $$6$$ are left-handed, $$18$$ have size $$7$$ feet and $$2$$ are left-handed with size $$7$$ feet.

Draw a Venn diagram and find the probability that a student is:
a)   left-handed or has size $$7$$ feet,
b)   not left-handed and does not have size 7 feet,
c)   not left-handed and has size 7 feet.

• #### Worked Example

In a class of $$30$$ children there are $$20$$ who can swim and $$25$$ who can ride a bike. There is no child who cannot swim nor ride a bike.

Draw a Venn diagram and find the probabilities that a child selected at random
a)   can swim and ride a bike,
b)   cannot swim but can ride a bike
c)   can swim but cannot ride a bike.

• 0, Chapter 6, Summary,

### Summary

Two events are independent if one event happening does not affect the probability of the other
event.

For example, rolling the first dice to obtain a SIX and the second dice to obtain an EVEN number.

If two events cannot happen or take place at the same time, they are said to be mutually exclusive.

For example, rolling a dice when one event is 'obtaining a SIX' and the other event is 'obtaining a number less than FOUR'.

For two independent events, $$A$$ and $$B$$, then

$$p \left(A\;and\;B\right) = p \left(A\right) \times p \left(B\right)$$

If events $$A$$ and $$B$$ are mutually exclusive events, then

$$p \left(A\;or\;B\right) = p \left(A\right) + p\left(B\right)$$

Interactive Exercises:
• Interactive Exercises - Extending Probability, https://www.cimt.org.uk/sif/datascience/ds8/interactive.htm
• Multiplication for Independent Events, https://www.cimt.org.uk/sif/datascience/ds8/interactive/s1.html
• Mutually Exclusive Events, https://www.cimt.org.uk/sif/datascience/ds8/interactive/s2.html
• Tree Diagrams and Conditional Probability, https://www.cimt.org.uk/sif/datascience/ds8/interactive/s3.html
• Using Venn Diagrams to Find Probabilities, https://www.cimt.org.uk/sif/datascience/ds8/interactive/s4.html
File Attachments:

## Measures of Variation

Sections:
• 0, Chapter 1, Introduction,

### Introduction

Analysing data is another key aspect of Data Science. After completing this unit you should be able to

• construct cumulative frequency tables and graphs (curves)
• understand how to use the cumulative frequency curve to estimate the median and upper and lower quartiles of a set of data
• construct and use box and whisker plots.

There are three sections in this unit, namely:

1. Cumulative Frequency
2. Box and Whisker Plots
3. Standard Deviation
• 0, Chapter 2, Cumulative Frequency,

### Cumulative Frequency

The Cumulative frequencies is the running total of the frequencies.

Cumulative frequencies are useful if more detailed information is required about a set of data.

In particular, they can be used to find the median and inter-quartile range

The median is the middle value when the data values are put in numerical order.

The inter-quartile range contains the middle 50% of the sample and describes how spread out the data are.

• 1, Chapter 2, Task 1, Worked Example 1,

### Worked Example

• #### Worked Example

For the data below, draw up a cumulative frequency table and then draw a cumulative frequency graph.

• 1, Chapter 2, Task 2, Estimating the Median and Quartile,

### Estimating the Median and Quartile

This cumulative frequency graph in the last example gives the test results for $$120$$ students. We will find estimates for the median and the lower and upper quartiles.

For the median, move horizontally from $$60$$ on the vertical axis.

When the line reaches the curve, move vertically down to the horizontal axis. The line cuts this axis at the median.

Median estimate is $$53$$

• 1, Chapter 2, Task 3, Example of quartiles,

### Example of quartiles

This cumulative frequency graph gives the test results for $$120$$ students. We will use this to find estimates for the median and the lower ($$LQ$$) and upper ($$UQ$$) quartiles.

The lower quartile is at $$25\%$$ of $$120$$,  that is $$30$$ and this has value  $$LQ = 45$$

The upper quartile is at $$75\%$$ of $$120$$,  that is $$90$$ and has value $$UQ = 60$$

The inter-quartile range is given by: $$UQ – LQ = 60 – 45 = 15$$

• 1, Chapter 2, Task 4, Worked Example 2,

### Worked Example

• #### Worked Example

a)   Estimate the mark attained by the top $$10\%$$ of students
b)   Estimate the number of students who scored more than $$75$$ marks.

• 1, Chapter 2.1, Median and Quartiles,

### Median and Quartiles

A quartile is one of $$3$$ values, lower quartile, median and upper quartile, which divides data into 4 equal groups.

A percentile is one of $$99$$ values which divides data into $$100$$ equal groups.

The lower quartile corresponds to the $$25^{th}$$ percentile.

The median corresponds to the $$50^{th}$$ percentile.

The upper quartile corresponds to the $$75^{th}$$ percentile.

We sometimes talk about a 5-number summary of data that enables location and variation to be judged and compared for data sets.

Here are the values of a 5-number summary

$$Q_{1}$$ = minimum value
$$Q_{2}$$ = lower quartile
$$Q_{3}$$ = median
$$Q_{4}$$ = upper quartile
$$Q_{5}$$ = maximum value

• 1, Chapter 2.1, Task 1, Calculating the Median and Quartiles,

### Calculating the Median and Quartiles

For a set of n data points, the median is located at the middle value; i.e. $$\frac{(n + 1)}{2}$$ data point.

The quartiles are at the $$\frac{(n + 1)}{4}$$ and $$\frac{3(n + 1)}{4}$$ data points.

• #### Worked Example

The number of goals scored by the $$13$$ members of a football team are:
$$9\;0\; 0\; 8\; 12\; 2 \;1\; 2\; 1\; 10 \;6 \;11\; 8$$

Determine the median and quartiles for this data.

• 0, Chapter 3, Box and Whisker Plots,

### Box and Whisker Plots

The box and whisker plot is another measure of variation that enables you to illustrate and compare the measures of both location and variation through the median and quartiles.  Consider the data set below,

We can find the lower quartile, median and upper quartile.

The box is formed by the two quartiles, with the median marked by a line, whilst the whiskers are fixed by the two extreme values, $$4$$ and $$15$$.

• 1, Chapter 3.1 , Comparing Data Sets,

### Comparing Data Sets

Here are two data sets for comparison

The box and whisker plots below clearly show the key differences.

• 1, Chapter 3.1, Task 1, Worked Example 3,

### Worked Example

• #### Worked Example

The ages, in years, of drivers involved in fatal road accidents in one week in England are shown below

 17 82 42 48 21 35 23 24 18 57 45 62 20 21 33 27 24 37 58 69 65 19 15 21 28 71 43 31 73 26 18 21 34 35 51 21 23 65 22 45 23 27 18 19 32 25 61 36

We can show these data with a box and whisker plot. First construct a stem and leaf diagram:

 1 5 7 8 8 8 9 9 2 0 1 1 1 1 1 2 3 3 3 4 4 5 6 7 7 8 3 1 2 3 4 5 5 6 7 4 0 3 5 5 8 5 1 7 8 6 1 2 5 5 9 7 1 3 8 2
• 1, Chapter 3.1, Task 2, Measures of Variation: Example,

### Measures of Variation: Example

The 5-number summary for the data on the previous slide is given by:

$$Q_{1}$$ = minimum value  $$15$$
$$Q_{2}$$ = lower quartile $$21$$
$$Q_{3}$$ = median $$29.5$$
$$Q_{4}$$ = upper quartile $$45$$
$$Q_{5}$$ = maximum value $$82$$

Here is the box and whisker plot, showing that the data is non symmetric with $$50\%$$ of drivers under the age of $$30$$.

• 1, Chapter 3.2, Outliers,

### Outliers

Outliers are extreme values in data sets and are often ignored as they can distort the data analysis.

They could be errors or an extreme value.

One definition of outliers is in terms of the interquartile range

$$IQR = Upper\;Quartile\;\left(UQ \right) – Lower\;Quartile\; \left(LQ\right)$$

An outlier is “any value that is $$1.5\;times$$ the $$IQR$$ more than the $$UQ$$ or $$1.5 \times IQR$$ less than the $$LQ$$”.

• 1, Chapter 3.2, task 1, Worked Example 4,

### Worked Example

• #### Worked Example

Are there any outliers in the data from the last slide where $$min = 15, LQ = 21, UQ = 45, max = 82$$ ?

• 0, Chapter 4, Standard Deviation,

### Standard Deviation (s.d.)

This is another measure of variation; it is based on adding the square of the distance each data point is from the mean value.

$$s.d \; = \sqrt{\frac{\sum\limits_{i=1}^{n}({x_{i} - \bar{x} })^{2} }{n}}$$
$$x_{i} \;$$ represents each datapoint $$(x_{1}, x_{2}, ... x_{n})$$
$$\bar{x} \;$$ is the mean,
$$n$$ is the number of values

Note that you square the differences, divide by the number of values and then take the square root. Clearly if all the data points are equal, then the s.d. is $$0$$.

• 1, Chapter 4, Task 1, Worked Example 5,

### Worked Example

Use the slider to explore worked examples.

• #### Worked Example

Find the mean and standard deviation of each of these number sets:
a) $$10, 11, 12, 13, 14$$
b) $$5, 6, 12, 18, 19$$

• #### Worked Example

Two machines, $$A$$ and $$B$$, fill packets with soap powder.

A sample of packets was taken from each machine and the weight of powder (in kg) was recorded.

a)   Find the mean and standard deviation for each machine.

b)   Which machine is most consistent?

• 1, Chapter 4.1, Standard Deviation: Scientific Calculators,

### Standard Deviation: Scientific Calculators

Most scientific calculators now have a ”statistical” mode and you can input data to determine the mean and standard deviation and other statistical measures.

Whilst the calculations on the previous slide do give an understanding of the concept of standard deviation, it is important to be able to use stats mode on your calculator.

If you have “lost” your instruction on how to use stats mode, just google:

“model number and how to use stats mode”

and you will get the instructions you need.

It is really important that you can use and understand the stats mode on your calculator!

• 0, Chapter 5, Summary,

### Summary

The cumulative frequency curve (graph) illustrates the cumulative frequency of the set of data.

The lower quartile $$\left(Q_{1}\right)$$ corresponds to $$25\%$$ on the $$y-axis$$ which means that $$25\%$$ of values are below this value.

The median $$\left(Q_{2}\right)$$ is the value corresponding to $$50\%$$ on the $$y-axis$$. This means it is the middle value of the data.

The upper quartile $$\left(Q_{3}\right)$$  corresponds to $$75\%$$ on the y-axis which means that $$25\%$$ of values are above this value.

A box and whisker plot is a way of illustrating a set of data, showing its smallest and largest values, median and upper and lower quartiles (see next page).

Interactive Exercises:
• Interactive Exercises - Measures of Variation, https://www.cimt.org.uk/sif/datascience/ds5/interactive.htm
• Cumulative Frequency, https://www.cimt.org.uk/sif/datascience/ds5/interactive/s1.html
• Box and Whisker Plots, https://www.cimt.org.uk/sif/datascience/ds5/interactive/s2.html
• Standard Deviation, https://www.cimt.org.uk/sif/datascience/ds5/interactive/s3.html
File Attachments:

## The Normal Distribution

Sections:
• 0, Chapter 1, Introduction,

### Introduction

Whilst most people have a certain understanding of what is meant by the normal distribution, here we consider both its mathematical background and how it can be applied in a range of contexts. After studying this unit, you should

• understand the derivation and meaning of the standardised and normal distributions (mean 0 and
standard deviation 1)
• be able to transform any normal distribution into the standardised normal distribution
• use normal distribution tables to find probabilities and, inversely, given z values, to find probabilities
• be able to use and apply the normal distribution in a range of contexts.

There are four sections in this unit, namely:

1. Looking at Data
2. Standardised Normal Distribution
3. Transformation of Standardised Normal Distribution
4. Examples in Context

• 0, Chapter 2, Looking at Data,

### Looking at Data

The Normal Distribution is a very useful way of summarising and working with symmetric data sets. .

Note that it is bell shaped and symmetric about the mean. Below is a graph of a typical normal distribution

One well known example is that of IQ (Intelligence Quotient) for adults; it has a mean value of 100 (μ) and standard deviation (σ) of 15, which actually means that:

• Approximately 68% of IQ values lie between 100 ±15  i.e. between 85 and 115.
• Approximately 95% of IQ values lie between 100 ±30  i.e. between 70 and 130.

• 0, Chapter 3, Normal Distribution Characteristics,

### Normal Distribution Characteristics

Here is a detailed Normal Distribution curve, with percentage areas for each section.

For example, there is 34.1% of the area (data) between μ (the mean) and μ+σ (the mean plus the standard deviation). The later value is often refered to as 1 standard deviation from the mean.

Similarly there is 47.7% between the mean and 2 standard deviations

• 1, Chapter 3, Task 1, Worked Example 1,

### Worked Examples

• #### Worked Example

A survey showed that the average height of $$16-19$$ year olds was approximately $$169\;cm$$ with $$SD\;9\;cm$$. Assuming the data follows a normal distribution, find:

a) the percentage of sixth formers shorter than $$187\;cm$$;

b) the number taller than $$151\;cm$$ if there are $$300$$ students in total.

• 1, Chapter 3.1, Standardised Normal Distribution,

### Standardised Normal Distribution

This diagram shows the graph of  STANDARDISED normal distribution that has mean 0 and standard deviation 1.

It is known as the probability density function (p.d.f.)

The area under the curve represents the probability up to the value on the horizontal and we represent this p= Φ(x).

We can look up values of this probability either in tables or on a calculator; here are some important values:

You can see all these and other values in the next section.

• 1, Chapter 3.2, Normal Distribution Table,

### Normal Distribution Table

This table gives the area under the graph for values of x from 0 to 3.

You can see that:

p(1.45) = 0.9265
p(1.96) = 0.9750

• 1, Chapter 3.2, Task 1, Worked Example 2,

### Worked Examples

Use the slider to explore worked examples.

• #### Worked Example

Use the tables or your calculator to find $$p \left(𝑍 \gt 1.2 \right)$$.

NOTE: always draw a diagram to show the area needed.

• #### Worked Example

For the standardized normal distribution, use the tables or your calculator to find $$p \left( −2.0 \lt 𝑍 \lt 2.0 \right)$$

\

• #### Worked Example

For the standardized normal distribution, use the tables or your calculator to find "$$p$$" \$$\left( −1.2 \lt 𝑍 \lt 1.0 \right)$$

• #### Worked Example

If 𝑍 ~ 𝑁 $$\left(0, 1\right)$$, find a and b so that: $$𝑝 \left(𝑍 \lt a\right) = 0.90$$ and $$p \left(𝑍 \gt b \right) = 0.25$$

• 0, Chapter 4, Transformation of Standardised Normal Distribution,

### Transformation of Standardised Normal Distribution

When a variable $$X$$  follows a normal distribution, with mean μ and variance  σ^2, this is denoted by:

$$X \sim N \left(\mu ,\sigma ^{2}\right)$$

$$Z = \frac{X - \mu }{\sigma }$$

We can transform this to the Standardised Normal Distribution using the simple formula opposite.

This ensures that this new probability density function has mean 0 and standard deviation σ and we will see how to use this in practical contexts. This technique is best understand by studying the Worked Examples that follow

The tallest person in recent recorded history is an American, Robert Wadlow (1918 - 1940), who reached a height of 2.72 m (8 ft 11.1 in).
Using data on heights from the exercise in Section 1, where μ = 169 cm and σ = 9 cm, the z value for Robert Wadlow's height is:

$$Z = \frac{X - \mu }{\sigma }= \frac{272-169}{9} \approx 11.4$$

This is about 11 standard deviations about the mean and more accurate tables suggest that it is only exceeded with probability of $$10^(-10)$$ and so extremely unlikely that a taller person will ever appear in our lifetime.

• 1, Chapter 4, Task 1, Worked Examples 3,

### Worked Examples

Use the slider to explore worked examples.

• #### Worked Example

If 𝑋 ~ 𝑁(4, 9), calculate a) p(X > 6) and b) p(X > 1)

$$X\sim N(\mu ,\sigma ^{2} )$$

• #### Worked Example

Intelligent Quotient (IQ) are designed to be 𝑋 ~ 𝑁(100, 225).
To join MENSA an IQ of 138 is required. What percentage of the population are eligible to join?

• 0, Chapter 5, Summary,

### Summary

Whilst the standardised Normal Distribution has a mathematical formula, derived by the German mathematician
Carl Gauss in the early 19th century, in practical contexts this formula is not used as tables give sufficiently comprehensive values. Indeed these are now stored on some recently developed calculators (for example, the Casio f x −991EX).

The standardised normal distribution, denoted by N(,0 1) has mean 0 and standard deviation 1 and is bell-shaped

The area under this curve has value 1

Tables are provided to give the probability up to a certain value, z, as shown in the second diagram

A normal distribution of mean μ and standard deviation (i.e. N ) $$\mu \sigma$$ can be transformed to the standardised normal distribution using the transformation

$$z=\frac{x - \mu}{\sigma}$$

For a standardised normal distribution, about 95% lies ± 2standard deviations and about 68% between ±1 standard deviations of the mean

Interactive Exercises:
• The Normal Distribution Interactive Exercises, https://www.cimt.org.uk/sif/datascience/ds10/interactive.htm
• Looking at your Data, https://www.cimt.org.uk/sif/datascience/ds10/interactive/s1.html
• The P.D.F. of the Normal, https://www.cimt.org.uk/sif/datascience/ds10/interactive/s2.html
• Transformation of Normal P.D.Fs , https://www.cimt.org.uk/sif/datascience/ds10/interactive/s3.html
• More Complicated Examples, https://www.cimt.org.uk/sif/datascience/ds10/interactive/s4.html
File Attachments:

## Correlation and Regression

Sections:
• 0, Chapter 1, Introduction,

### Introduction

In this unit we consider correlation and regression between two variables. After studying this unit you should

• understand the concept and measuring of variation
• be able to use Spearman's Rank Correlation Coefficient for data that is ranked
• be able to use and interpret the product Moment Correlation Coefficient
• understand the concept of regression
• be able to use the line of regression, based on the method of least squares.

The unit is divided into the following five related sections below.

The sections are based on whether data shows any correlation and developing a line of best fit when the data does show high correlation.

1. Correlation
2. Spearman’s Rank Correlation Coefficient
3. Product Moment Correlation Coefficient
4. Regression Line
5. Line of Regression Equation
• 0, Chapter 2, Correlation,

### Correlation

You may already have an understanding of the concept of correlation between two sets of data. We can illustrate this by using a scatter diagram. Here are some examples of variables that have:

1. positive correlation
2. no correlation
3. negative correlation

• 1, Chapter 2, Task 1, Worked Examples 1,

### Worked Examples

• #### Worked Example

The scatter diagram below shows the ages, in years, and the selling prices, in thousands of pounds, of second-hand cars of the same model.

a)    The price of one of these cars has been advertised wrongly. Give the age and price of the car that is incorrectly advertised?

b)    Another car is to be included in the advertisement next week. The car is four years old. Do you think the price will be more than $$£6500$$ or less?

c)    What sort of correlation is shown?

• #### Worked Example

An investigation was conducted by a company on the value of various assessment methods for recruiting employees. The data is opposite

This is based on 8 employees, giving their educational test scores, together with an assessment score by the Personnel Officer of their ability one year after joining the company.

Possible test scores in each case can range from a low of 1 to a high of 20

Rank each employee for both Educational Test score and Assessment score by the Personnel Officer, calculate Spearman’s rank correlation coefficient and interpret the answer.

• 1, Chapter 2, Task 2, Equal Ranks: Example,

### Equal Ranks: Example

Mathematics and Physics marks for 10 students are given in the table below

We first rank the two scores using 1 as the highest rank etc but for Maths, rank I and J and both 5.5 and in Physics D, E and I are all ranked 4 (to cover ranks 2, 3, 4) etc.

Now we calculate $$d$$ and $$d^{2}$$ and adding the 𝑑^2 column gives 59.5.

With $$n=10$$ , using the formula, we get $$r=0.64$$ to 2 decimal places, showing positive correlation between Mathematics and Physics marks.

• 0, Chapter 3, Spearman’s Rank Correlation Coefficient,

### Spearman’s Rank Correlation Coefficient

This is a method used to assign a meaning to the correlation between pairs of data points. Any coefficient, say r, is designed so that $$-1 \leq r \leq 1$$  and  $$r = −1$$  corresponds to perfect negative correlation,  $$r = 0$$  to no correlation and $$r = 1$$  to perfect positive correlation (as illustrated below).

This coefficient is based on adding the squares of the differences between data points after ranking – that is, put in numerical order and then given the values $$1, 2, 3, ...,$$ etc. Here is the formula:

#### Coefficient formula

$$r = 1 - \frac{6\sum{d^{2}} }{n\left(n^{2} - 1\right) }$$

Here $$n$$ is the number of data points and d the difference between values and $$\sum d^{2}$$ means adding up all the $$d^{2}$$ values.

• 1, Chapter 3, Task 1, Worked Examples 2,

### Worked Examples

• #### Worked Example

At the 'Best of British Pie' competition two judges award marks for $$9$$ different pies shown in the table below. Calculate the Spearman Rank Correlation Coefficient.

• 1, Chapter 3.1, Correlation Coefficients,

### Correlation Coefficients

There are many other correlation coefficients.

What is the advantage of Spearman’s Correlation Coefficient?

• Easy to calculate

What is one of the main disadvantages of Spearman’s Correlation Coefficient?

• Ranking the data loses accuracy

Another correlation coefficient that does not lose the accuracy is the Product Moment Correlation Coefficient (PMCC) but it is more complicated to calculate. Here is the formula for the PMCC:

#### Formula for the PMCC

r = $$\frac{\sum{xy - n{\overline{x} } \overline{y} } }{\sqrt{\left(\sum{x^{2} - n\overline{x} ^{2}} \right) }\left(\sum{y^{2} - n\overline{y} ^{2}} \right) }$$

Here $$\overline{x}$$ and $$\overline {y}$$ are the mean values of the $$x$$ and $$y$$ data points

• 0, Chapter 4, Product Moment Correlation Coefficient (PMCC),

### Product Moment Correlation Coefficient (PMCC)

You can calculate PMCC by completing the table opposite and finding the totals in each column.

You can find the mean values, $$x$$ and $$y$$ from the first two columns, giving:
$$\bar{x} = \frac{\sum{x}}{n} = \frac{170}{10} = 17$$
and similarly $$\bar{y}=30$$

Substituting the values in the formula:
$$r = \frac{\sum{xy} - n \bar{x} \bar{y}}{\sqrt{(\sum{x^{2}} - n \bar{x}^{2}) ( \sum{y^{2}} - n \bar{y}^{2} ) }}$$
$$= \frac{5313-10 \times 17 \times 30}{\sqrt{ (3250-10 \times 17 \times 17) (9250 - 10 \times 30 \times 30 ) }}$$
$$= 0.71 \; to \; 2 \; decimal \; points$$

Note that the value of the PMCC is $$0.71$$ whilst from earlier work we calculated Spearman’s Rank Correlation Coefficient as $$0.64$$.

So the more accurate value from the PMCC shows a higher level of positive correlation.

Important Note:
it is much more efficient to use a calculator to calculate the PMCC but you need to be careful to be accurate when inputting the data into your calculator

• 0, Chapter 5, Regression Lines,

### Regression Lines

You will have met the idea of trend lines – these are essentially 'lines of best fit’. There is little value in attempting to draw lines of best fit unless there is either strong positive or strong negative correlation between the points plotted, as shown in the following diagrams.

Also note that the lines of best fit should always pass through the point representing the mean values,  $$\overline{x}$$ and $$\overline{y}$$ of the data points.

• 1, Chapter 5, Task 1, Worked Examples 3,

### Worked Examples

• #### Worked Example

Mr Bean often travels by taxi and has to keep details of the journeys in order to complete his claim form at the end of the week. Details for journeys made during a week are:

(a)    plot the data on graph paper.

(b)    Use the mean value of the $$x$$ and $$y$$ data points to draw a line of best fit.

c)    Obtain the equation of the line of best fit in the form $$y=mx + c$$

d)    Give an interpretation for the value of $$c$$ in your calculation.

• 0, Chapter 6, Line of Regression Equation,

### Line of Regression Equation

In the previous section, we used lines of regression by drawing these as accurately, by eye, as possible. Here we will extend this and show how to use a formula to find the accurate line of regression.

If you have a set of paired data points $$\left(x_{i}, y_{i}\right)$$ for which there is strong correlation (positive or negative), the problem is to find the line which best fits the data.

If the line is to be used to predict values of $$y$$ based on known values of $$x$$ it is called the '$$y$$ on $$x$$' line and its equation is determined by making $$d^{2}_{1} +d^{2}{2} +d^{2}{3}+…= \sum d^{2}$$ as small as possible, as illustrated in the picture below.

#### Regression line

The $$y$$ on $$x$$ regression line is given by

$$y - \overline{y}=\frac{s_{xy}}{s_{x}^{2}} \left(x - \overline{x} \right)$$

where $$s_{x} =\sqrt{\frac{1}{n} \sum{x^{2} - \overline{x^{2}} } }$$

and

$$s_{xy} =\frac{1}{n} \sum\limits_{i =1}^{n}{\left(x - \overline{x} \right) }$$

Fortunately we do not need to make these calculations as calculators with STATS mode will do the work for you. So make sure you are familiar with how to use your calculator to find the equation of the regression line.

We will show the actual calculation for one example in the next section but suggest you use the STATS mode on your calculator to confirm the equation of the line of regression.

• 1, Chapter 6, Task 1, Line of Regression Equation: Example,

### Line of Regression Equation: Example

Data obtained from an experiment where weights were attached to an elastic band and the length of the band measured, are shown below

The data points have been graphed opposite and you can see that there are very close to a straight line; so it is appropriate to find the equation of the line of regression

Here is the general formula:

$$y - \bar{y} = \frac{S_{xy}}{S_{x}^{2}}(x- \bar{x})$$ where $$s_{x} = \sqrt{\frac{1}{n} \sum{x^{2}} - \bar{x}^{2} }$$ and $$S={xy} = \frac{1}{n} \sum_{i=1}^{n} (x_{i} - \bar{x} (y_{i} - \bar{y}) )$$

Here is the calculation:
$$\bar{x} = \frac{1800}{8} = 225$$>br /> $$\bar{y} = \frac{597}{8} = 74.625$$
$$s_{xy} = \frac{156150}{8} - 225 \times 74.625 = 2728.125$$
$$s_{x}^{2} = \frac{510000}{8} - 225^{2} = 13125$$
$$\Rightarrow y - 74.625 = \frac{2728.125}{13125} (x - 225)$$

This gives: 𝑦=0.21𝑥+27.86 and this is shown on the graph

• 0, Chapter 7, Summary,

### Summary

Correlation is a measure of how two sets of data are related whereas regression shows how well data
fits a straight line. Regression should only be used when the data is strongly correlated (positively or
negatively).

Measures of correlation, $$r$$, is designed so that $$-1 \leq r \leq 1$$  and  $$r = 1$$  corresponds to perfect positive correlation
and $$r = 0$$ for no correlation and $$r = −1$$ for perfect negative correlation.

Spearman's Rank Correlation Coefficient is based on ranked data and is given by

$$r = 1 - \frac{6\sum{d^{2}} }{n\left(n^{2} - 1\right) }$$

Product Moment Correlation Coefficient (sometimes known as Pearson's PMCC) is based on actual data values and is given by

$$r = \frac{\frac{1}{n} \sum\limits_{i = 1}^{n}{x_{i} y_{i} - \overline{x} } \overline{y} }{s_{x} s_{y}}$$

where

$$s_{x} = \sqrt{\frac{1}{n} \sum\limits_{i = 1}^{n}{x_{i}^{2} - \overline{x^{2}} } }$$

$$s_{y} = \sqrt{\frac{1}{n} \sum\limits_{i = 1}^{n}{y_{i}^{2} - \overline{y^{2}} } }$$

Regression lines are essentially lines of best fit based on given data

The equation of the line of regression is based on minimising the square of the distances of the data points from the regression line,

i.e. minimising  $$\sum\limits_{i=1}^{n}{d_{i}^{2}}$$

The y and x line of regression is given by

$$y - \overline{y} = \frac{Sxy}{sx^{2}} \left(x-\overline{x} \right)$$

when

$$S_{xy} = \frac{1}{n}\sum\limits_{i=1}^{n}{x_{i}y_{i} - \overline{x} \overline{y} }$$

and

$$S_{x^{2}} = \frac{1}{n}\sum\limits_{i=1}^{n}{x_{i}^{2} - \overline{x}^{2} }$$

Interactive Exercises:
• Interactive Exercises - Correlation and Regression, https://www.cimt.org.uk/sif/datascience/ds1/interactive.htm
• Correlation Product Moment Correlation Coefficient , https://www.cimt.org.uk/sif/datascience/ds1/interactive/s1.html
• Spearman's Rank Coefficient of Correlation, https://www.cimt.org.uk/sif/datascience/ds1/interactive/s2.html
• Product Moment Correlation Coefficient , https://www.cimt.org.uk/sif/datascience/ds1/interactive/s3.html
• Regression Lines , https://www.cimt.org.uk/sif/datascience/ds1/interactive/s4.html
• Line of Regression Equation, https://www.cimt.org.uk/sif/datascience/ds1/interactive/s5.html
• Regression Lines, https://www.bbc.co.uk/bitesize/guides/zmt9q6f/revision/3, images/Images/bitesize-logo.png, More on lines of best fit
File Attachments:

## Probability and Binomial Distributions

Sections:
• 0, Chapter 1, Introduction,

### Introduction

This is an introductory unit on discrete probabilty distributions and is an important stepping stone before moving on to the Poisson distribution. After completing this unit you should

• understand the concept of a random variable
• be able to determine the expectation and variance of a random variable for a discrete distribution
• understand how to find the expectation and variance of a discrete uniform distribution
• be able to recognise when to use the Binomial distribution
• understand how to find the mean and variance of the distribution
• be able to apply the Binomial distribution to a variety of problems.

There are six sections in this unit, namely:

1. Expectation
2. Variance
3. Probability Distributions
4. The Uniform Distribution
5. Developing the Binomial Distribution
6. The Mean and Variance of the Binomial Distribution
• 0, Chapter 2, Expectation,

### Expectation

You are probably familiar with experiments in probability, for example throwing a dice and observing the outcome. You may also be familiar with the concept of expectation. We will use a simple worked example in the next section to introduce the concept of expectation. We will also introduce the idea of a random variable. After we have looked at the example we will define these concepts more precisely.

• 1, Chapter 2, Task 1, Worked Example 1,

### Worked Example

• #### Worked Example

When throwing a normal dice, let $$X$$ be the random variable: $$X$$ is the “square of the score shown on the dice”. What is the expected value of $$X$$ ?

• 1, Chapter 2.1, Expectation 1,

### Expectation

We now need to be more precise:

Discrete Random Variable: this is a quantity that may take a given range of values that cannot be predicted exactly but can be described in terms of the probability of each of the possible values.

If the possible values are denoted by $$x_{i}$$ $$(i=1,2,3,...,n)$$, and the probability of each of these values is given by $$P(X=x_{i}) = p_{i}$$, then:
$$p_{1}+p_{2}+...+p_{n}=1$$

Expectation: for a discrete random variable that can take the values $$x_{i}(i=1,2,3,...,n)$$, then $$E(X)$$ denotes the expectation and:
$$E(X)=\sum\limits_{i=1}^{n}{x_{i} \times p_{i}} = x_{1} \times p_{1} + x_{2} \times p_{2} + ... + x_{n} \times p_{n}$$

where the “sigma” sign, $$\sum{}$$, means summing over all the values.

• 1, Chapter 2.1, Task 1, Worked Example 2,

### Worked Example

• #### Worked Example

Let $$X$$ be the random variable:
“the number of HEADS obtained when tossing a fair coin 3 times”.
a) What are the possible values of $$X$$?
b) What are the associated probabilities?
c) Determine the expectation of $$X$$.

• 0, Chapter 3, Variance,

### Variance

This is a similar concept to variance (or standard deviation) of a data set, and you may be familiar with the formula:
$$\sigma =s^{2}= \frac{\sum{(x_{i}-\bar{x})^{2} } }{n}= \frac{\sum{x_{i}^{2}} }{n}-\bar{x}^{2}$$
and we will define the variance of a random variable in a similar way.

#### Remember

Given that the expectation of a random variable, $$X$$, is equivalent to the mean of a data set, we define the variance of $$X$$ as:

$$V(X)=E(X^{2}) - [E(X)]^{2}$$

• 1, Chapter 3, Task 1, Worked Example 3,

### Worked Example

• #### Worked Example

Let X be the random variable “square of the score shown on the dice”.
We have already seen in Section 1 that the expectation, $$E(X) = 7$$.
Find the variance and standard deviation of the random variable,$$7$$.

• 0, Chapter 4, Probability Distributions,

### Probability Distributions

We have now introduced the concept of a discrete random variable and also defined what is meant by the expectation and variance of the random variable, namely:

$$E(X) = \sum{x_{i} \times p_{i}}$$

$$V(X)=E(X^{2}) - [E(X)]^{2} = \sum{x_{i}^{2} \times p_{i} - (\sum{x_{i} \times p_{i}})^{2}}$$

We will consider one more random distribution before moving on to the the Uniform distribution and the important Binomial distribution, which is fundamental to much of statistical analysis.

• 1, Chapter 4, Task 1, Worked Example 4,

### Worked Example

• #### Worked Example

For a fair 10-sided spinner, marked $$1,2,3,...,10$$ with $$S$$ the random variable “the score on the spinner”, find:
a) The probability distribution of $$S$$,
b) Expected value of $$S$$,
c) Standard deviation of S.

• 0, Chapter 5, The Uniform Distribution,

### The Uniform Distribution

One important distribution is the uniform one in which all possible outcomes have equal possibilities.

The random variable $$X$$ is said to follow a uniform distribution when all its outcomes are equally likely.

A very simple example is given by the random variable $$H$$ : “the number of heads seen when a single coin is tossed”.
Here the probability distribution is given by:

 h 0 1 p(h) $$\frac{1}{2}$$ $$\frac{1}{2}$$

$$E(H) = 0 \times \frac{1}{2} + 1 \times \frac{1}{2} = \frac{1}{2}$$ (as expected)

$$V(H) = 0^{2} \times \frac{1}{2} + 1^{2} \times \frac{1}{2} - (\frac{1}{2})^{2} = \frac{1}{2}$$ and standard deviation $$\sqrt{\frac{1}{2}} = \frac{1}{4}$$

• 1, Chapter 5, Task 1, Worked Example 5,

### Worked Example

• #### Worked Example

A fair 8-sided spinner has numbers $$1,2,3,4,5,6,7$$ and $$8$$ on its sides. If $$X$$ is the random variable 'score on the spinner', determine both $$E(X)$$ and $$V(X)$$ from first principles.

• 0, Chapter 6, Binomial Distribution,

### Binomial Distribution

We will use a worked example to introduce this distribution.

• #### Worked Example

Ashoke, Theo and Sadie will each visit the local leisure centre to swim on one evening next week but have made no arrangement between themselves to meet or go on any particular day.

The random variable $$X$$ is: “the number of the three who go to the leisure centre on Wednesday”. Find the probability distribution for$$X$$, that is, the probabilities for each value of $$X$$.

scroll left to see another worked example

• 1, Chapter 6.1, Notation,

### Notation

Before getting to the Binomial Distribution, we need one new notation, this is: $$^{n} C_{r}= \frac{n!}{(n-r)!r!}$$
where: $$n!=n \times (n-1) \times (n-2) \times ... \times 3 \times 2 \times 1$$

#### For Example

$$^{10} C_{3}= \frac{10!}{(10-3)!3!}$$ $$= \frac{10!}{7! \times 3!}$$   where

$$10! = 10 \times 9 \times 8 \times 7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1 \ = 3628800$$

$$7! = 7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1 = 5040$$

$$3! = 3 \times 2 \times 1 = 6$$

giving  $$^{10} C_{3}= \frac{3628800}{5040 \times 6} = 120$$

Note: it is easier to calculate these expressions by noting that:
$$^{10} C_{3}$$

$$= \frac{10!}{7! \times 3!}$$

$$=\frac{10 \times 9 \times 8 \times (7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1 ) }{(7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1) \times 3 \times 2 \times 1}$$
$$= \frac{10 \times 9 \times 8}{3 \times 2 \times 1}$$

$$= 120$$

and even easier just to use the $$^{n} C_{r}$$ button on your calculator!

• 1, Chapter 6.2, Binomial Distribution 1,

### Binomial Distribution

#### Remember

If the probability of success is p and the “experiment” is repeated n independent times with the probability remaining constant, and $$X$$ is the random variable “number of successes”, then its probability distribution is given by this formula:

$$P(X=r) = ^{n} C_{r} p^{r} (1-p)^{n-r}$$ for $$r=0,1,2,...,n$$

We also use the notation:
$$X \sim B(n,p)$$
to mean that the random variable X has a Binomial distribution with probability $$p$$ and $$n$$ independent experiments.

• 1, Chapter 6.2, Task 1, Worked Example 6,

### Worked Example

• #### Worked Example

Ashoke, Theo and Sadie will each visit the local leisure centre to swim on one evening next week but have made no arrangement between themselves to meet or go on any particular day.

The random variable $$X$$ is: “the number of the three who go to the leisure centre on Wednesday”. Find the probability distribution for $$X$$, that is, the probabilities for each value of $$X$$.

• 0, Chapter 7, Mean and Variance,

### Mean and Variance

If you play ten games of table tennis against an opponent who, from past experience, you know only has a $$\frac{1}{5}$$ chance of winning a game with you, how many games do you expect him to win?

Most people would reply 'two' and would argue that since the opponent wins on average $$\frac{1}{5}$$ of the games he can expect to win in $$\frac{1}{5} \times 10 = 2$$ .

Another way of writing this would be to say, if $$X \sim B(10, \frac{1}{5})$$ , what is the value of $$E(X)$$ where $$E(X)$$ means the expected value of $$X$$ ?

The answer then is $$E(X)=10 \times \frac{1}{5} = 2$$ .

#### Remember

If $$X \sim B (n,p)$$, then:

$$E(X) = n \times p$$ and    $$V(X) = n \times p \;q$$

• 1, Chapter 7, Task 1, Worked Example 7,

### Worked Example

• #### Worked Example

A biased die is thrown thirty times and the number of sixes seen is eight. If the die is thrown a further twelve times find:
a) the probability that a six will occur exactly twice,
b) the expected number of sixes,
c) the variance of the number of sixes.

• 0, Chapter 8, Summary,

### Summary

Random variables can be discrete (fixed number of numerical values) or continuous (over a range of values)

Note that for any discrete random variable, $$X$$,
$$\sum\limits_{all \; x}{P(X=x)} =1$$

The expectation of a discrete random variable, $$x$$, is given by:
$$E(X) = \sum\limits{all \; x}{x \times P(X=x)}$$

The variance of a discrete random variable, $$X$$, is given by:
$$V(X) =E(X^{-2}) - [E(X)]^{2}$$

A uniform distribution is when all outcomes for the random variable $$X$$ are equally likely. If there are $$n$$ possible outcomes, then:
$$P(X=x) = \frac{1}{n} \; , \; x=1,2,...,n$$
and     $$E(X) = \frac{n+1}{2} \; , \; V(X) = \frac{n^{2}-1}{12}$$

The Binomial distribution can be applied to an experiment with a finite number of trials, $$n$$, when the underlying probability of success, $$p$$, remains the same. We write:
$$X \sim Bin(n,p) \;$$ or $$\; X \sim B(n,p)$$

If $$X \sim Bin(n,p) \;$$ then:
$$p(x=r) = (\begin{matrix} n \\ r \end{matrix} ) p^{r} q^{n-r}$$
where $$q=1-p \;$$ and $$\; r=0,1,2,...,n$$

$$(\begin{matrix} n \\ r \end{matrix} ) = \frac{n!}{r!(n-r)!}$$

$$(\begin{matrix} n \\ 0 \end{matrix} ) =1$$
$$(\begin{matrix} n \\ 1 \end{matrix} ) =n$$
$$(\begin{matrix} n \\ 2 \end{matrix} ) = \frac{n(n-1)}{2}$$
$$(\begin{matrix} n \\ n \end{matrix} ) =1$$

Interactive Exercises:
• Probability and Binomial Distributions Interactive Exercises, https://www.cimt.org.uk/sif/datascience/ds9/interactive.htm
• Expectation, https://www.cimt.org.uk/sif/datascience/ds9/interactive/s1.html
• Variance, https://www.cimt.org.uk/sif/datascience/ds9/interactive/s2.html
• Probability Distributions, https://www.cimt.org.uk/sif/datascience/ds9/interactive/s3.html
• The Uniform Distribution, https://www.cimt.org.uk/sif/datascience/ds9/interactive/s4.html
• Finding the Distribution, https://www.cimt.org.uk/sif/datascience/ds9/interactive/s5.html
• The Mean and Variance of the Binomial Distribution, https://www.cimt.org.uk/sif/datascience/ds9/interactive/s6.html