Real Time Cab matching services such as Lyft and Uber use stream processing technologies such as Kafka and Samza for processing real-time GPS data. This project was aimed at understanding the underlying backend implementation of Taxi services/ companies that have to deal with processing large chunks of data at real time.
As a part of the coursework I was provided with large data-sets which had to be partitioned into topics using the Kafka broker. Once the stream had been accomplished, I wrote a Samza program that would consume the data from Kafka. I further used the data on a customized algorithm which matched drivers and customers based on their distance, customers gender preferences and several other parameters. The final result was then relayed to a javascript front-end to enable a stream visualization similar to that of Uber/Lyft.