The law in Alameda County, California states that a Jury Panel for a particular trial should be selected by chance (randomly) from the list of eligible residents. In this case, there are 1453 eligible residents from which a Jury Panel of 100 should be randomly selected. Then, the lawyers and judges follow a legal process to non-randomly select a jury of 12.The
law in Alameda County, California states that a Jury Panel for a
particular trial should be selected by chance (randomly) from the list
of eligible residents. In this case, there are 1453 eligible residents
from which a Jury Panel of 100 should be randomly selected. Then, the
lawyers and judges follow a legal process to non-randomly select a jury
of 12.
We are going to look at the distribution of the eligible residents and compare that to the distribution of the selected Jury Panel. We will attempt to determine how likely it is that the jury panel was selected by random chance. You have already learned enough about writing Python code to make this determination.**Please run all code cells in order starint with the import cell below.**We are going to look at the distribution of the eligible residents and compare that to the distribution of the selected Jury Panel. We will attempt to determine how likely it is that the jury panel was selected by random chance. You have already learned enough about writing Python code to make this determination.
Please run all code cells in order starint with the import cell below.
xxxxxxxxxxfrom datascience import *import numpy as npimport matplotlib.pyplot as plotsplots.style.use('fivethirtyeight')%matplotlib inlinenp.set_printoptions(legacy='1.13')xxxxxxxxxxjury = Table().with_columns( 'Ethnicity', make_array('Asian', 'Black', 'Latino', 'White', 'Other'), 'Eligible', make_array(0.15, 0.18, 0.12, 0.54, 0.01), 'Panels', make_array(0.26, 0.08, 0.08, 0.54, 0.04))juryxxxxxxxxxxjury.barh('Ethnicity')xxxxxxxxxxjury_with_diffs = jury.with_column('Difference', jury.column('Panels') - jury.column('Eligible'))jury_with_diffsxxxxxxxxxxjury_with_diffs = jury_with_diffs.with_column('Absolute Difference', np.abs(jury_with_diffs.column('Difference')))jury_with_diffsxxxxxxxxxxsum(jury_with_diffs.column('Absolute Difference'))xxxxxxxxxxsum(jury_with_diffs.column('Absolute Difference')) / 2xxxxxxxxxxdef total_variation_distance(distribution_1, distribution_2): return sum(np.abs(distribution_1 - distribution_2)) / 2xxxxxxxxxxtotal_variation_distance(jury.column('Panels'), jury.column('Eligible'))So far, we have a useful table and a way to calculate how far one distribution is from another. Let's use what we have done to explore whether the Jury Panel that was selected was likely to have been selected by random chance.So far, we have a useful table and a way to calculate how far one distribution is from another. Let's use what we have done to explore whether the Jury Panel that was selected was likely to have been selected by random chance.
xxxxxxxxxxeligible = jury.column('Eligible')xxxxxxxxxxpanels_and_sample = jury.with_column('Random Sample', sample_proportions(1453, eligible))panels_and_samplexxxxxxxxxxpanels_and_sample.barh('Ethnicity')Qualitatively, we see that there is a noticeable difference in what was actually selected for the Jury Panel and what we would expect if the Jury Panel were selected by chance. Let's put some number to this.Qualitatively, we see that there is a noticeable difference in what was actually selected for the Jury Panel and what we would expect if the Jury Panel were selected by chance. Let's put some number to this.
xxxxxxxxxx# We have already defined a function named total_variation distance that takes two distirubtions as inputtotal_variation_distance(panels_and_sample.column('Random Sample'), eligible)xxxxxxxxxxtotal_variation_distance(jury.column('Panels'), eligible)It looks like the random sample produces results close the the ethnicity of the eligible population. But, the Jury Panel that was actually selected is not that close. Maybe we just had bad luck with the one random sample that we took. Let's take 10,000 random samples and see if any of them come close to the actual Jury Panel that was selected.It looks like the random sample produces results close the the ethnicity of the eligible population. But, the Jury Panel that was actually selected is not that close. Maybe we just had bad luck with the one random sample that we took. Let's take 10,000 random samples and see if any of them come close to the actual Jury Panel that was selected.
xxxxxxxxxxtvds = make_array()repetitions = 10000for i in np.arange(repetitions): sample_distribution = sample_proportions(1453, eligible) new_tvd = total_variation_distance(sample_distribution, eligible) tvds = np.append(tvds, new_tvd)xxxxxxxxxxTable().with_column('Total Variation Distance', tvds).hist(bins = np.arange(0, 0.2, 0.005), ec='w')All of the random samples show a very small total_variation_distance from the eligible population. The Jury Panel that was actually selected showed a total_variation_distance of .14 which would be way tothe right on the chart above.All of the random samples show a very small total_variation_distance from the eligible population. The Jury Panel that was actually selected showed a total_variation_distance of .14 which would be way tothe right on the chart above.
xxxxxxxxxxtotal_variation_distance(jury.column('Panels'), eligible)Based on the charts and numbers you have seen, do you think our Jury Panel selected by random chance or was something else involved?Based on the charts and numbers you have seen, do you think our Jury Panel selected by random chance or was something else involved?