top of page

NYC Driver Behavior Analysis Project Series (Part 4a: Decoding Executive Business Requests)

Jun 3

5 min read

0

11

Discover how Business Intelligence teams transform business requirements into impactful data products!

Recap

We have completed two crucial steps in our journey so far:


1. Understanding business requirements thoroughly

In Part 2, we joined Tom Hank and Sarah Conner’s, leaders of TCIC’s Underwriting team. They posed a critical challenge:


Analyzing the impact of COVID-19 on NYC driver behavior.

As we tackled this request, we encountered a common issue in large corporations:


Senior leadership issuing data requests without a full understanding of the data’s intricacies.

This often leads to vague or overly simplistic requests, posing challenges for data analysts and engineers.


2. Developing a comprehensive strategy

We addressed this underlying issue in Part 3 by focusing on the importance of strategic planning. We identified potential data sources and crafted a cost-effective strategy.


Shifting Gears

In Part 4, we pivot our focus toward designing the pipeline. We will analyze data sources and proactively address any limitations they may present. This collaborative effort between data analysts and engineers is crucial for ensuring data integrity and reliability.


Scenario: The Data Team Assembles!

Nancy scheduled the Business Requirements Review at 9:00am with Jonathan. She patiently waited in front of the main door at the lobby of TCIC’s corporate offices. She chuckled as she remembered her first months taking public transportation in NYC.


Jonathan arrived at 9:07am, rushing through the main door. He issued an apology, but Nancy swiftly brushed it off and lead the way to the elevators. Once settled in the Athens conference room, she toggled her work laptop to tablet mode by detaching the keyboard. She took out her stylus pen from her backpack and wrote down the following:


Data Pipeline

  • Data Collection

  • Storage and Integration

  • Processing

  • Analysis and Final User Interface


Jonathan: “I’m going to need context.”


Nancy: “These layers represent important milestones in the lifecycle of any data engineering platform.”


Jonathan: “Okay, I’m listening.”


Nancy: “First, it’s crucial to gather our data sources. This includes things like spreadsheets, system reports, databases, and even social media. The more sources we collect from, the better we understand different parts of the business.”


Jonathan: “What happens during the Storage and Integration phase?”


Nancy: “In this phase we validate and transform the data. We want to merge our data and prepare it for processing. I will do some research to decide which tools work best for this project.”


Jonathan: “What about Analysis and Final User Interface?”


Nancy: “Once processed, we must deliver the transformed data to the users or other applications. Think of the data you see in databases or in data reporting solutions like MicroStrategy or SAP. The information in these systems is curated and the solution is optimized for the analysts to run queries.”


Jonathan: “So how do you define what a data pipeline is?”


Nancy: “That would be the set of processes for moving data from one system to another. You can say that these bullet points represent the different stages for creating a data pipeline.”


Jonathan: “So, how can I help you here?” — The excitement on his face was evident.


Nancy: “Well, as an analyst you are responsible for telling the data story. I need to know your requirements so I can start designing the solution.”


Jonathan: “Got it. I think I will need some time to gather my notes and put a plan together.”


Data Collection

They reconvened after lunch in the Athens room. Nancy promptly logged into her computer and opened the link Jonathan had sent.

Jonathan: “This is going to be our main source of data.”


Nancy: “Okay. Where do we begin?”


Jonathan: “Let’s start by understanding the business requirements listed by the Underwriting team.”


Nancy: “Well, they mentioned the possibility of revamping their pricing model for premiums.”


Jonathan: “Yes, but we must to understand the reason behind this.”


Nancy: “Sarah mentioned that the analysis of their client data shows unexpected results. We don’t have access to their information.”


Jonathan: “Correct. The ask then, is to research external data. They want to compare insights from this data with their results.”


Nancy: “Can you fill me in here?" — she clicked on the About page of the NYC Open Data website.


Jonathan: “The goal of the Open Data project is to act as a centralized data hub for public agencies across New York City. Their team collaborates with many agencies to identify and provide access to data, manage platform operations and enhancements.”


Nancy: “I see where you’re going with this. Do they have data on NYC drivers?”


Jonathan: “Yes! The New York City Police Department partnered up with this foundation to make vehicle collisions data available to the public!”


Nancy: “Good. We can access this public data under The Freedom of Information Law (FOIL).”


Jonathan: “Can you go to the main page and search for ‘collision’? You should see the information about the dataset in the results screen.”

Nancy: “It looks like there are three different links.”


Jonathan: “The data is split into three different tables. Each table has specific information about crashes, vehicles and persons respectively.”


Nancy: “This dataset is quite comprehensive.”


Jonathan: “Yes, we also need to link vehicle and person data to provide a holistic view of each incident.”


Nancy: “How do we ensure data accuracy?”


Jonathan: “Good point. The data is preliminary and often gets revised. We should design the pipeline to handle updates to these crash details smoothly. We can benefit from syncing regularly with the NYC Open Data API, powered by Socrata.”


Nancy: “Anything else?” — she was ready to jot down more requirements.


Jonathan: “In the description of the dataset it says that the manual TAMS system was changed to the electronic FORMS system in 2016. We should look out for these types of changes. This could affect the quality of the data collected.”


Nancy: “Speaking of changes over time, wouldn’t it be beneficial to look at the data from the time or historical angle?”


Jonathan: “You got there before me, great! Yes, my idea revolves around that, understanding vehicle collisions from three different periods:


  1. Pre COVID-19

  2. During the lockdown

  3. Post Lockdown

This should give us a comprehensive view of the effects of the pandemic on NYC Drivers.”


Nancy: “Awesome. I see you prepared some resources. Do you want to review them now?”


Jonathan: “In a bit, I’m curious to see your assessment of the metadata first.”


Conclusion

As Part 4a draws to a close, we reflect on the dynamic collaboration between Nancy and Jonathan. Their partnership was characterized by distinct problem-solving approaches, each contributing unique perspectives to decipher the Underwriting team’s business requirements. Their cohesion and adaptability allowed them to seamlessly combine efforts. They demonstrated a shared commitment to delivering impactful insights through meticulous data collection.


Leveraging external data sources such as the NYC Open Data project, they expanded the scope of their analysis while ensuring its relevance and reliability. Their collaborative efforts highlight the synergy between data analysts and engineers in navigating complex projects.


Next Steps

We expand this scenario in sections 4b and 4c. We will initiate the data discovery phase by performing a meticulous metadata analysis. Our objective remains clear: to develop a comprehensive design by the conclusion of this three-part series.

 

Thank you for spending a few minutes with me today — I’m grateful for your time and interest. For a deeper dive, visit my GitHub page to explore project resources.


Let’s connect!

You may find me on LinkedIn. You can also visit my personal site if you are interested in more content like this.


Please don’t forget to follow my Medium page to catch each update in the series.

Jun 3

5 min read

0

11

bottom of page