NYCLU Testimony before NYC Council Committee on Technology Re: Automated Processing of Data
Testimony of the New York Civil Liberties Union before the New York City Council Committee on Technology regarding Automated Processing of Data (Int. 1696-2017)
The New York Civil Liberties Union (“NYCLU”) respectfully submits the following testimony in support of Int. 1696, legislation relating to government use of computer algorithms. The NYCLU, the New York state affiliate of the American Civil Liberties Union, is a not-for-profit, non-partisan organization with eight offices across the state, and over 160,000 members and supporters statewide. The NYCLU’s mission is to defend and promote the fundamental principles, rights and constitutional values embodied in the Bill of Rights of the U.S. Constitution and the Constitution of the State of New York.
Algorithms are a series of steps or instructions that are designed to perform a specific task or solve a problem. Algorithms are widely used in society to make decisions that affect most aspects of our lives, including which school a child can attend, whether a person will be offered credit from a bank, what products are advertised to consumers, and whether someone will receive an interview for a job. Federal, state and local governments are increasingly using algorithms to conduct government services. One of the promises of algorithms is that they can process, analyze and manipulate large amounts of data to help optimize government services.
However, algorithms are fallible human creations that are vulnerable to many sources of bias and error. So there should be great concern when governments employ algorithms whose design and implementation are not understood by the government agents using them or the public. There is a strong public interest in ensuring that algorithms are designed and used in an equitable manner, especially when they affect decisions regarding the use of government force, allocation of public resources, or the potential deprivation of civil liberties. In order to make this assessment, information about the design, use, and functions of algorithms must be transparent. Without algorithmic transparency, governments stand to lose democratic accountability, efficacy and fairness of government processes, and control over sensitive public data.
Int. 1696 requires any government agency that uses algorithms for targeting government services, policing, or imposing penalties, to publish the source code of the algorithm on the agency’s website. The legislation also requires these government agencies to allow the public to submit data to test the algorithm and receive the results of these tests. This legislation is a necessary step in ensuring that government use of algorithms actually benefits New Yorkers. We urge the City Council to take action and pass Int. 1696 into law.
I. Government Use of Algorithms
As the power and responsibilities of government administrative agencies have grown, often without concomitant funding increases, administrators have increasingly employed algorithms and other automated systems to reduce backlogs, identify problems, and eliminate guesswork by government agents. Algorithms are used by the New York City Department of Education to evaluate teachers; by the New York City Fire Department to anticipate where fires may spark; and by the New York City Department of Health to identify serious pregnancy complications. These are just a few of the many functions throughout City government that algorithms serve.
But government use of algorithms creates significant threats to personal liberty. Despite the growing concerns regarding the fairness and efficacy of algorithms, as well as the due process problems they create, there seems to be no slowing in their adoption to conduct public affairs. This is in part because of the growing “smart cities” movement that seeks to integrate data collection and technological solutions to address local government needs. Yet, as governments shift to more data-driven, algorithm based decision-making, careful scrutiny of algorithms and public engagement to assess them becomes increasingly important.
II. Algorithms contain many sources of bias and error
Although algorithms may appear to be inherently neutral, each step in creating an algorithm requires the programmer to make decisions, some consequential and some trivial. As a result, algorithms are vulnerable to human bias, poor judgment, unavoidable trade-offs and careless or unforeseen errors at each stage of development and use. Moreover, the data on which algorithms are trained often reflects existing discrimination and disparities; as a consequence, algorithms will often themselves be biased unless developers take proactive de-biasing steps.
Design of the Algorithm
At the design stage, programmers must make a series of decisions about how the algorithm will function as well as its limitations. These decisions include the arrangement of user functions, the technical architecture of the system, data selection, and factor weighting. Though these decisions seem technical in nature, they can result in promoting certain values or advantaging certain groups of people or outcomes.
Some design decisions that tend to favor certain values, interests, groups, or outcomes are intentional. A positive example is a design decision that promotes consumer privacy, such as systems that do not store or immediately delete user data records. However, intentional decisions regarding algorithm design may be perverted by ulterior goals or motives. In Italy, a government programmer conspired with over 100 police officers and local government officials to code red light cameras so that the system would turn from yellow to red quicker, so more motorists could be caught.
Similarly, financial incentives may drive programmers to design an algorithm that will produce results that favor the customer’s preferred outcome, rather than accurate or fair outputs. This is particularly true for algorithm based forensic tools that are sold exclusively to government agencies. In the past three years, public crime labs in Austin, Texas and Washington, D.C. have temporarily shut down DNA testing because of flawed algorithmic systems. More recently, ProPublica reporting revealed that thousands of criminal cases may be compromised by the New York City’s crime lab use of an algorithm that may have been intentionally skewed to create more matches.
Other design decisions are made for technical, efficiency, usability, functionality, business or practical reasons; but they often involve significant trade-offs. Programmers may make decisions that increase the utility or performance of algorithms, but conflict with societal notions of fairness. One example is incorporating parents’ mental health history as a factor in assessing child endangerment risks. The use of an algorithm that assigns significant weight to use of the mental health system can have the effect of penalizing individuals who seek mental health treatment, which raises fairness, welfare and legal concerns.
Programmers also make mistakes at the design stage. One study found that even highly experienced programmers failed to identify or correct technical mistakes when coding, which resulted “in almost 1% of all expressions contained in source code being wrong.” Mistakes are also more likely to occur when real-world policies written in human language are converted to computer languages. These mistakes can be a result of misinterpretations of the policy on the part of the programmer or because code may not capture certain nuances in the original policy. When these mistakes go unnoticed in government algorithms, they carry expensive or irreversible consequences. In Colorado, programmers encoded over 900 mistakes in an algorithm used to administer the state’s public benefit system; this resulted in cancer patients and pregnant women being falsely denied Medicaid benefits, and eligible food stamp recipients having their benefits discontinued. These mistakes affected hundreds of thousands of people, wasted several hundred million dollars, and resulted in litigation as well as a federal probe.
Training the Algorithm
Part of the development of a modern algorithm involves training it on a set of data. Programmers make numerous decisions regarding how an algorithm will be trained, including what data inputs are used and how much data the system has capacity to process. Thus, the decisions regarding what data is used and the quality of that data can result in undesirable, misleading, or biased outputs.
A common programming error is the use of poorly-selected data inputs. Problems include choosing a data set that is too small or too homogenous, or flaws in the technical rigor and comprehensiveness of data collection resulting in incomplete or incorrect data. For example, several algorithm-based facial-recognition systems that were trained on photos of predominately white people resulted in racist outputs, such as classifying images of black people as gorillas.
A more troubling error is the use of real-world data sets that reflect historical or societal discrimination. As a result of residential segregation, geographic designations may serve as proxies for race, leading to false data correlations that perpetuate bias. Notably, predictive policing systems have been criticized for overreliance on inherently biased historical police data. In fact, the Oakland Police Department decided against using a predictive policing algorithm after a study showed that the system would have disproportionately deployed police to lower-income, minority neighborhoods for drug crimes, even though public health data suggested drug crimes occurred in many other neighborhoods throughout Oakland.
Interpreting and Using the Algorithm’s Results
When end users are not properly trained on the purpose of an algorithm, or not informed about the underlying logic of its design, it can be very difficult for them to fully comprehend the results. This lack of understanding and training can lead to government agents misinterpreting or giving too much deference to algorithmic results. If the algorithm is inscrutable, government agents will either have to disregard it completely or blindly follow the result. This outcome conflicts with traditional notions of government accountability, particularly if the results influence decisions that affect civil liberties.
If government officials falsely believe that the algorithmic results are inherently neutral or otherwise superior to human judgment, they may simply reify the algorithm’s choice. Too much deference can be extremely problematic, since algorithms, by nature, simplify or generalize data making categorical judgments that treat people as members of groups, rather than as individuals. In fact, research suggests that, over time, deference to algorithms may weaken the decision-making capacity of government officials, who may become incapable of responsibly deviating from algorithmic instructions.
III. Importance of Algorithmic Transparency
Most local governments lack the expertise and resources required to develop algorithmic systems for all agency functions. As a result, privately developed algorithms are shaping local government procedures and decisions; yet it is often the case that neither members of the public nor government agents know much about the design or implementation of algorithm-based systems. There has always been and, to some degree, there will always be some risk of error or bias in government decision making; however, the opacity regarding government algorithms serve to increase the risk.
Algorithmic systems function best when stakeholders have access to enough information so that they can identify problems in the design of the algorithm, and it its application. Therefore, greater transparency about the algorithms that government agencies use and how they are being used or implemented can help increase the accuracy, fairness, and overall utility of these tools. As algorithmic tools improve, they produce greater cost savings and help local governments become more sustainable. Algorithmic transparency can also help increase public confidence in government practices and systems by making the constituents feel like they actively engage the government systems that affect their lives. Conversely, if algorithm-based decisions of government remain opaque and invisible, New Yorkers will feel increasingly confused about the rationale for government policies; this will lead to increasing skepticism about the fairness and accountability of government officials and the decisions they make.
Currently, federal and state open records laws are the primary vehicles to making government use of algorithms more transparent. These methods are imperfect because government responses to requests for sources codes and other relevant data are typically slow, incomplete, or nonresponsive. Therefore, we urge the City Council to pass Int. 1696 as soon as possible because the civil liberties and civil rights of New Yorkers depend on it.
 Framework for Teaching Evaluation Instruments, http://usny.nysed.gov/rttt/teachers-leaders/practicerubrics/Docs/danielson-teacher-rubric.pdf.
 Brian Heaton, New York City Fights Fire with Data, Government Technology, May 15, 2015 , http://www.govtech.com/public-safety/New-York-City-Fights-Fire-with-Data.html.
 New York City Department of Health and Mental Hygiene, Bureau of Maternal, Infant and Reproductive Health, New York City, 2008-2012: Severe Maternal (2016), https://www1.nyc.gov/assets/doh/downloads/pdf/data/maternal-morbidity-report-08-12.pdf.
 Harry Surden, Values Embedded in Legal Artificial Intelligence at 2 (2017), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2932333.
 Jacqui Cheng, Italian Red-Light Cameras Rigged with Shorter Yellow Lights, ars technica, Feb. 2, 2009, http://arstechnica.com/tech-policy/news/2009/02/italian-red-light-camerasrigged-with-shorter-yellow-lights.
 Rebecca Wexler, Convicted by Code, SLATE, Oct. 6,2017, http://www.slate.com/blogs/future_tense/2015/10/06/defendants_should_be_able_to_inspect_software_code_used_in_forensics.html; Anne Q. Hoy, Fingerprint Source Identity Lacks Scientific Basis for Legal Certainty, Am. Assoc. for the Advancement of sci., Sept.15, 2017, https://www.aaas.org/news/fingerprint-source-identity-lacks-scientific-basis-legal-certainty.
 Keith L. Alexander, National accreditation board suspends all DNA testing at D.C. crime lab, Washington post, Apr. 27, 2015, https://www.washingtonpost.com/local/crime/national-accreditation-board-suspends-all-dna-testing-at-district-lab/2015/04/26/2da43d9a-ec24-11e4-a55f-38924fca94f9_story.html?utm_term=.24780d3105ea; Tony Plohetski, Austin police DNA lab closed amid forensics commission’s concerns, american-stateman, June 10,2016, http://www.mystatesman.com/news/austin-police-dna-lab-closed-amid-forensics-commission-concerns/rjbYwEnkci0IVy7LAPXVnM/.
 Lauren Kirchner, Thousands of Criminal Cases in New York Relied on Disputed DNA Testing Techniques, propublica, Sept. 4, 2017, https://www.propublica.org/article/thousands-of-criminal-cases-in-new-york-relied-on-disputed-dna-testing-techniques.
 Harry Surden, Values Embedded in Legal Artificial Intelligence at 1 (2017), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2932333.
 Robert Brauneis and Ellen P. Goodman, Algorithmic Transparency for The Smart City at 16 (2017), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3012499.
 Christian Chessman, A “Source” of Error: Computer Code, Criminal Defendants, and the Constitution, 105 Cal. L. Rev. 179, 186 (2017), http://scholarship.law.berkeley.edu/cgi/viewcontent.cgi?article=4350&context=californialawreview citing Derek M. Jones, Operand Names Influence Operator Precedence Decisions, 20 CVU 1, 2, 5 (2008).
 Danielle Keats Citron, Technological Due Process, 85 wash. U. L. Rev. 1248, 1268-9 (2008).
 Id.; The Denver Post, Editorial, Why is the CBMS still such a mess?, Denver Post, Feb. 17, 2011, http://www.denverpost.com/2011/02/17/editorial-why-is-cbms-still-such-a-mess/.
 Executive Office of the President, Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights at 7-8, May 4, 2016, https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/2016_0504_data_discrimination.pdf.
 Kate Crawford, Artificial Intelligence’s White Guy Problem, n.y. times, Jun. 25, 2016, https://www.nytimes.com/2016/06/26/opinion/sunday/artificial-intelligences-white-guy-problem.html; Alistair Barr, Google Mistakenly Tags Black People as ‘Gorillas,’ Showing Limits of Algorithms, wall st. j., Jul. 1, 2015, https://blogs.wsj.com/digits/2015/07/01/google-mistakenly-tags-black-people-as-gorillas-showing-limits-of-algorithms/; Odelia Lee, Camera Misses the Mark on Racial Sensitivity, gizmodo, May 15, 2009, https://gizmodo.com/5256650/camera-misses-the-mark-on-racial-sensitivity.
 CATHY O’NEIL, WEAPONS OF MATH DESTRUCTION at xx (2016).
 Emily Thomas, Why Oakland Police Turned Down Predictive Policing, motherboard, De. 28, 2016, https://motherboard.vice.com/en_us/article/ezp8zp/minority-retort-why-oakland-police-turned-down-predictive-policing; Kristian Lum, Predictive Policing Reinforces Police Bias, human rights data analysis group, Oct. 10, 2016 ,https://hrdag.org/2016/10/10/predictive-policing-reinforces-police-bias/.
 Robert Brauneis and Ellen P. Goodman, Algorithmic Transparency for The Smart City at 15 (2017), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3012499.
 Robert Brauneis and Ellen P. Goodman, Algorithmic Transparency for The Smart City at 19 (2017), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3012499.