Saturday, October 22, 2016

Analyzing four years of Traffic Violations using R

Traffic Violations Exploratory Data Analysis by Nour Galaby

 

 

















This is a summery of the full project
Full project here: https://github.com/NourGalaby/Traffic-Violations-Exportory-Data-Analysis-in-R

Traffic Violations EDA (Expltoray Data Analysis)

This dataset contains traffic violation information from all electronic traffic violations issued in the County of Montgomery.
It contains violations from 2012 to 2016. more than 800,000 entry.
In this project I will use R to make an Exploratory Data Analysis (EDA) on this Dataset.
 Sit tight,  lets get started.



This Dataset contains 24 variables 
  • “Date.Of.Violation” : Date where violation occured ex :1/1/2012
  • Time.Of.Violation : time when vilation happend ex: 00:01:00
  • “Violation.Description” : Description of violation in text
  • “Violation.Location” : The Location name in text
  • “Latitude” : Latitude location ex: 77.04796
  • “Longitude” : Longitude location ex:39.05742
  • “Geolocation” : both Latitude and Longitude ex:77.1273633333333, 39.0908983333333
  • “Belts.Flag” : if driver had belt at time of violation (Yes, NO)
  • “Personal.Injury” : if any personal injury occured as result of the violation (Yes, NO)
  • “Property.Damage” : if any property damaged occured as result of the violation (Yes, NO)
  • “Commercial.License”: If driver has Commercial License (Yes, NO)
  • “Commercial.Vehicle”: if Vehicle has Commercial.License (Yes, No)
  • “Alcohol” : If Driver was DUI at time of violation (Yes, No)
  • “Work.Zone” : if violation happend in a work zone (Yes, No)
  • “Violation.State” : The state where violation happend ex: MD
  • “Vehicle.Type” : ex: Automobile, Truck, Motorbike
  • “Vehicle.Production.Year”: ex:1990
  • “Vehicle.Manfacturer” : ex:Toyota
  • “Vehicle.Model” : ex: CoROLLA
  • “Vehicle.Color” : ex: Black, White
  • “Caused.an.Accident” : if the violation caused an accident (Yes, No)
  • “Gender” : Gender of driver (M,F)
  • “Driver.City” : City of driver ex:BETHESDA
  • “Driver.State” : ex: MD






The Main features of interest are the date and time of violation, and the damage caused. I would like to see how violations happen yearly, and if there is a certain period where a lot of violations happens



Lets start by plotting violations over time...


This data from the year 2012 to 2016 .. there may be some patterns. but its not clear and its too noisy to note anything. lets smooth it and try again

Adding smoother

 


default smoother doesn't help much.. that is because there is too many data.. Lets group by week. and take average over that week and see

Group by week


Much better.. if you look closely there maybe a pattern here…
but we will look into that shortly… Lets try grouping by month too.

Grouping over each month


now the pattern is clear … to make it even clearer lets group by year and plot
years over each other

Coloring Years


We can see that Violations increase over years.. and there seem to be a certain time where violations peak.

Plotting years

 


Here we can see that at May we see the most violations of the year.. and followed by October ? could that be the increase of people who travel
there at the summer ? or simply the start of summer and people go out more ? I wonder...
and at 2015 something was different and the peak was no longer at may.


We can see from this violations clearly how much each week differ from each year


Plotting People that caused damage by date and gender




I notice something here: Most violations are by males. but the days where males don’t make many violations. Female make many violations. We can and vice verse.. we can see it here in the spikes.. a male positive spike is often coupled with a female negative spike, but this issue should be looked at closer.

Alcohol only Violations   


I
It seems that most Alcohol violations for both men and women happen between 2 PM and 5 PM

Summary


This graph shows the count of violations in each minute. it shows when violations generally happen during the day.
Here are some things to notice about this graph
  • the line is the weighted mean calculated by passing a sliding window.
  • at 00:00 till 8:00 the variance in the number of violations is very low (all points are close)
  • violations peak twice a day; at 7:00 and at 11:00 PM


*from this plot we can see number of violations increase over the years till 2014 it reached a peak. then started to come down at 2015
  • May and October have the most violations in all years.. I wonder why?


This plot shows the location of violations of a particular location..zoomed in… I choose it because it looks like the violations draws the map of the streets..
You can tell the major streets by just looking at the violations.. and it looks oddly like a blood veins..
this plot may not convey a lot of information,however I think this plot is very interesting and that's why I choose to put it in the summary/

Reflection


The traffic violation data set contains information on more than 800,000 violation occurred from 2012 till 2016. I this shows how much violations increase through the years.and what are the most times violations occur in, which I learned May and October see the most violations.
Also I used this data to get the most popular cars and models.
It seems this data can be used a lot to help reduce violation and understand its causes. like analyzing the most locations that violations occur and understand its causes.
Struggles I had with this dataset is that most of its variables are categorical. and not continuous. This made it very hard to derive insights and make comparisons, I heavily relied on the “count” of violations variable. as I grouped by each category. and I found very interesting insights (like in datetime and location)
one thing to make it better and could be future work is using this data with another labeled maps data. so we can see clearly where the violations occur…
also the description of the violation could be grouped into categories (ex: speeding, traffic light ignore, reckless driving) and studied further to help reduce violations and accidents, and make traffic better for everyone.



Source Code and full project:  https://github.com/NourGalaby/Traffic-Violations-Exportory-Data-Analysis-in-R

No comments:

Post a Comment