Description
PROFESSIONAL SUMMARY * Over 3 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem, Java and Software Functional Testing. * Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, Hbase, Spark, Sqoop, Flume and Oozie. * Hands on experience using Cloudera and Hortonworks Hadoop Distributions. * Strong understanding of various Hadoop services, MapReduce and YARN architecture. * Responsible for writing Map Reduce programs. * Experienced in importing-exporting data into HDFS using SQOOP. * Experience loading data to Hive partitions and creating buckets in Hive. * Developed Map Reduce jobs to automate transfer the data from HBase. * Expertise in analysis using PIG, HIVE and MapReduce. * Experienced in developing UDFs for Hive, PIG using Java. * Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra. * Scheduling all hadoop/hive/sqoop/Hbase jobs using Oozie. * Good understanding of Scrum methodologies, Test Driven Development and continuous integration. * Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills. * Experience in defining detailed application software test plans, including organization, participant, schedule, test and application coverage scope. * Experience in gathering and defining functional and user interface requirements for software applications. * Experience in real time analytics with Apache Spark (RDD, DataFrames and Streaming API). * Used Spark DataFrames API over Cloudera platform to perform analytics on Hive data. * Developed multiple MapReduce jobs to perform data cleaning and preprocessing. * Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design. * Excellent communications skills possess strong problem solving, analytical, time management skills. * Experience analyzing and resolving performance, scalability and reliability issues. * Having experience in developing a data pipeline using Kafka to store data into HDFS. Good Experience on SDLC (Software Development Life cycle). Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. * Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala. * Responsible for design development of Spark SQL Scripts based on Functional Specifications. * Responsible for Spark Streaming configuration based on type of Input Source Environment Hadoop, HDFS, Pig, Sqoop, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala, Hortonworks, Apache Yarn. Deutsche Bank New York, New York Job Title: Hadoop Admin/ Developer April 2015 to Dec 2015 Responsibilities * Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications. * Installed application on AWS EC2 instances and also configured the storage on S3 buckets. * Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template. * Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch. * Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management. * Developed PIG scripts to transform the raw data into intelligent data as specified by business users. * Worked in AWS environment for development and deployment of Custom Hadoop Applications. * Worked closely with the data modellers to model the new incoming data sets. * Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs. * Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution. * Installed Hadoop, Map Reduce, HDFS, and AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing. * Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts * Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and Hbase. * Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. * Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters. * Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS. * Load the data into Spark RDD and do in memory data Computation to generate the Output response. * Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows. * Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs * Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. * Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. * Import the data from different sources like HDFS/Hbase into Spark RDD. * Developed a data pipeline using Kafka and Storm to store data into HDFS. * Performed real time analysis on the incoming data. * Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing. * Implemented Spark using Scala and SparkSQL for faster testing and processing of data. Environment Apache Hadoop, HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux. Guardian Life Insurance New York, New York Job Title: Hadoop Developer (Intern) September 2014 - Mar 2015 Responsibilities * Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure * Cluster maintenance as well as creation and removal of nodes. * Evaluation of Hadoop infrastructure requirements and design/deploy solutions (high availability, big data clusters. * Cluster Monitoring and Troubleshooting Hadoop issues * Manage and review Hadoop log files * Works with application teams to install operating system and Hadoop updates, patches, version upgrades as required * Created NRF documents which explains the flow of the architecture, which measure the performance, security, memory usage, dependency. * Setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users. * Help maintain and troubleshoot UNIX and Linux environment. * Experience analyzing and evaluating system security threats and safeguards. * Experience in Importing and exporting data into HDFS and Hive using Sqoop. * Developed Pig program for loading and filtering the streaming data into HDFS using Flume. * Experienced in handling data from different data sets, join them and preprocess using Pig join operations. * Developed Map-Reduce programs to clean and aggregate the data * Developed HBase data model on top of HDFS data to perform real time analytics using Java API. * Developed different kind of custom filters and handled pre-defined filters on HBase data using API. * Imported and exported data from Teradata to HDFS and vice-versa. * Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive * Implement counters on HBase data to count total records on different tables. * Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce. * Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc. * We used Amazon Web Services to perform big data analytics. * Implemented Secondary sorting to sort reducer output globally in map reduce. * Implemented data pipeline by chaining multiple mappers by using Chained Mapper. * Created Hive Dynamic partitions to load time series data * Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins. * Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries. * Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop. * Handling continuous streaming data comes from different sources using flume and set destination as HDFS. * Integrated spring schedulers with Oozie client as beans to handle cron jobs.