Twitter Web Service

TWITTER ANALYTICS WEB-SERVICE

As a part of this coursework group project, we designed a high performing webservice (front-end as well as backend) which handled 1 TB of Twitter source data to cater complicated read and write requests while ensuring optimal throughput.

We designed the database schema in a way most web applications these days design their API's.Since the raw data was huge, we had to perform ETL (Extract, Transform, Load) on the data using batch processing method over hadoop using Map Reduce. Once the data was cleaned and formatted we stored the data in MySQL and HBase using specific data loading processes such as LoadInFile and ImportTSV. We implemented the front-end webservice on a web-framework to cater requests including reads, writes and mixed reads (across databases) while ensuring that the webservice reached a throughput of 10K RPS in a stipulated time.

To ensure target throughput, we implemented various tuning parameters on the front-end including testing different application architectures such as Backend and Front end on the same webserver and having seperate Front-end and Backend servers, implementing cloud watch monitoring metrics such as CPU utilization on instances etc.

Backend tuning parameters for MySQL included optimal and light schema design, indexing, connection pooling, sharding et al. Certain tuning parameters used for HBase included region split, bloom filter, cluster design optimization et al.

Technologies Used:

AWS EC2 PERSONALIZED INSTANCES
MySQL Server
AWS EMR CLUSTER
HBase
JAVA
UNDERTOW
HADOOP
AWS CLOUDWATCH
AWS ELASTIC LOADBALANCER

Role: Backend Developer
Event: Coursework
Location: Carnegie Mellon University, Pittsburgh
Year: Fall 2017

Subhadeep Bhattacharyya

SOFTWARE ENGINEER

TWITTER ANALYTICS WEB-SERVICE

Technologies Used:

A tested architecture for the webservice