Help NYC TLC provide more public data - identify potential privacy risks in public taxi datasets!

The TLC has been a pioneer in sharing big data since 2010. With over 21,000 licensed vehicles equipped to capture GPS-enabled trip records. TLC’s trip data are valuable for policy makers, scholars, businesses and urban planners. Earlier data releases included anonymized vehicle and driver identifiers, but in 2014 they were de-anonymized and published. In response, we stopped releasing these identifiers altogether, which limits the data's usefulness.  TLC would like you help to re-release this information in the most efficient manner possible. 

 

Your Challenge

In order to protect both drivers' and passengers' privacy, this Hackathon has two tracks:

  • Track 1 - Decryption: TLC is using a new anonymization method created by the NYU Center for Urban Science and Progress and invites the cryptography community to test its strength. 
  • Track 2 - Passenger Privacy: We are calling out to all civic hackers to help us identify risks to passenger privacy resulting from releasing precise GPS coordinates of every trip's origin and destination.  We are especially interested in uncovering vulnerabilities created by introducing additional data. 

Kickoff Event

Saturday 10AM, October 15, 2016
10 E 21st St, New York, NY 10010, USA 2nd Floor

*Breakfast will be provided*

The Kickoff event will be hosted at General Assembly's Educational Facility. During this event, NYU CUSP will explain how to use their online data facility. Instructional guides will be emailed prior to the kickoff event. Attendance is not mandatory, but will help prepare you to use the online data facility. Participants that cannot attend will be able to watch a recording of the kickoff event.

General Assembly is also providing work space for participating teams during the kickoff event and throughout the Hackathon.  Participants will be shown the facility and told how to enter at the Kickoff. 

View full rules

Eligibility

The ideal participants are cryptographic or privacy experts who have experience working with Big Data.  However, other data scientists, academics, and computer science researchers are welcome to participate.

  • Individuals or teams of up to 4
  • Teams must be able to code in Python, R, Hadoop, or SQL
  • No team member can be a NYC TLC employee or licensee
  • No team member can be employed by an entity that is a TLC licensee/authorized vendor
  • No team member can be an employee of NYU’s CUSP
  • No team member may have intentions to use the confidential data for any reason other than the goals and objectives of this Hackathon

Requirements

  • You will be working on NYU CUSP's Online Data Facility for the duration of the Hackathon.
  • All work will be submitted online at NYU CUSP's Data Facility.
  • Online Usernames will be given to each memeber prior the kickoff event.
  • We will explain how to use NYU CUSP's Online Data Facility at the kickoff event on October 15.

Track 1: De-anonymize Medallion and Hack License Data

  • In the online data facility you will be supplied all yellow taxi trip records from 2014, including anonymized medallion and driver license numbers.
  • Your goal is to decrypt 10% of the medallion and/or license numbers.
  • Submit your de-encrypted trip records via the data facility along with a short narrative description of the method used.  Please write for non-technical audiences.
  • If there are any external datasets that you will need to use to decrypt this data, please let us know in the registration form. NYU CUSP will need to know this information ahead of time, in order for it to be available in their online data facility.

Track 2: Expose privacy vulnerabilities

  • In the online data facility you will be supplied the same yellow taxi trip records from 2014 that are already publicly available through NYC OpenData. 
  • Your goal is to identify risks to passenger privacy using publicly available yellow taxi trip records. 
  • Submit the trip records used, supporting documentation, and a short narrative description of the method used. Please write for non-technical audiences.
  • If there are any external tools or datasets that you will need to use, please let us know in the registration form. NYU CUSP will need to know this information ahead of time, in order for it to be available in their online data facility.

How to enter

  1. Review sample Data Sets to see what kind of data you will be using.
  2. Determine what other datasets you would like to use to help you. 
  3. Register as an individual or team on our Registration Form
  4. Each member of the team will be required to fill out a Hackathon Affidavit.
  5. Come to the Kickoff Event. NYU staff will introduce you to their state of the art online data facility that you will use during the Hackathon.
  6. NYU CUSP will provide you with login credentials.
  7. Hack!

Judges

Jeffrey Garber

Jeffrey Garber
Director of Technology and Innovation, TLC

Sonal Sahel

Sonal Sahel
Assistant General Counsel, TLC

No avatar 100

Ravi Shroff
Research Scientist, NYU CUSP

Sarah Kaufman

Sarah Kaufman
Assistant Director, NYU Rudin Center

No avatar 100

Andrew Leszko
Data Analytics Manager, DOT

Judging Criteria

  • Track 1: De-anonymize Medallion and Hack License Data
    The winner of this track will be first team to accurately decrypt 10% of the medallion and/or driver license numbers.
  • Track 2: Expose Privacy Vulnerabilities
    The top three teams will be chosen using the criteria below:
  • ⠀⠀
    Team used a method which exposes a passenger privacy concern that extends beyond observing a single trip, i.e. does not rely on witnessing the trip personally or with cameras (5 points)
  • ⠀⠀
    Team exposed a passenger privacy concern with only the 2014 TLC Trip Data (10 points) ⠀⠀⠀⠀⠀⠀⠀⠀⠀ Minus 2 points for any additional datasets used
  • ⠀⠀
    Team was able to associate multiple trips with personally identifiable information (10 points)
  • ⠀⠀
    Team exposed passenger('s) routine/behavior (5 points)
  • ⠀⠀
    Points for suggesting a fix for exposed concern (based on feasibility and effectiveness)
  • ⠀⠀
    Points awarded based on: The sum of points awarded for criteria 2-4 multiplied by the quality of the fix (0 = none submitted/completely ineffective or unfeasible; 1 = moderately effective and feasible; 2 = very effective and feasible)