Description
PROFESSIONAL SUMMARY * 8+ years of experience in software development, deployment and maintenance of various web-based applications using Java, and Big Data Ecosystems on Windows and Linux environments. * 5+ years of experience on major components in Hadoop Ecosystem like Map Reduce, HDFS, Hive, Pig, HBase, Zookeeper, Sqoop, Oozie, Flume, Storm, Yarn, Spark, Spark Streaming, Spark SQL, Nifi, Solr, Kafka, Impala. * Experience working on various Cloudera distributions like (CDH 4/CDH 5), MapR distributions, Horton works distributions and Knowledge on Amazon EMR Hadoop distributors. * Extensive experience working with real time streaming applications and batch style large scale distributed computing applications, worked on integrating Kafka with NiFi and Spark. * Developed re-usable and configurable components as part of project requirements in Java, Scala and Python. * Good knowledge of Scala's functional style programming techniques like Anonymous Functions (Closures), Currying, Higher Order Functions and Pattern Matching. * Strong knowledge on advanced features of Java8 like Lambda Expressions and Streams. * Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 instances and S3, configuring the servers for Auto scaling and Elastic load balancing. * Strong knowledge of machine learning algorithms like Linear Regression, Logistic Regression, Decision Tree, SVM and K-Means. * Experience in applying the latest development approaches including applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle. * Expertise in writing Spark RDD transformations, actions for the input data and Spark-SQL queries, Data frames to import data from Data sources to perform data transformations, read/write operations using Spark-Core and save the results to output directory into HDFS. * Hands on experience in coding Map Reduce/Yarn Programs using Java, Scala for analyzing Big data. * Good Knowledge on Spark framework on both batch and real-time data processing. * Good Knowledge on MLlib from Spark to use in predictive intelligence, customer segmentation and for smooth maintenance in Spark streaming. * Used pandas, numpy, seaborn, matplotlib, scikit-learn, scipy, NLTK in Python for developing various Machine Learning algorithms. * Prepared a Machine Learning algorithm to automatically extract and rank key phrase (concepts) from a document. * Researching and experimenting state of the art deep neural networks architectures for images and temporal action recognition in videos. * Expertise in Storm for reliable real-time data processing capabilities to Enterprise Hadoop. * Hands on experience in scripting for automation, and monitoring using Shell, Python & Perl scripts. * Through knowledge in ETL, Data Integration and Migration, extensively used ETL methodology for supporting Data Extraction, transformations and loading using Informatica. * Hands on experience with the data extraction, transformation and load in Hive, Pig and HBase. * Worked on importing data into HBase using SQOOP and HBase Client API. * Experience in developing data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data. * Hands on experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie. * Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata. * Good understanding knowledge in MPP (Massively Parallel Processing) databases such as HP Vertica and Impala. * Extensive hands-on experience in ETL, Oracle PL/SQL and Data Warehouse, Star Schemas. * Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse. * Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig. * Experienced in collecting metrics for Hadoop clusters using Ambari & Cloudera Manager. * Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager. * Experience in processing large volume of data and skills in parallel execution of process using Talend functionality. * Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency. * Good understanding and hands on work experience in writing applications on NoSQL databases like HBase, MongoDB and Cassandra with functionality and implementation. * Good understanding knowledge in installing and maintaining Cassandra by configuring the Cassandra.yaml file as per the requirement and performed reads and writes using Java JDBC connectivity. * Experience in Extraction, Transformation & Loading (ETL) of data with different file formats like CSV, text files, sequence files, Avro, Parquet, JSON, ORC and used file compression codecs like gzip, lz4 & snappy. * Experience in using version control tools like CVS, GIT, SVN. Build tools like SBT, Ant and Maven. * Good knowledge of Web/Application Servers like Apache Tomcat, IBM WebSphere and Oracle WebLogic. * Design and Programming experience in developing Internet Applications using JSP, MVC, Servlets, Struts, Hibernate, JDBC, JSF, EJB, AJAX, web-based development tools and Web Services using XML, HTML, and SOAP. * Worked with BI (Business Intelligence) teams in generating the reports and designing ETL workflows on Tableau, Deployed data from various sources into HDFS and building reports using Tableau. * Participated in entire Software Development Life Cycle including Requirement Analysis, Design, Development, Testing, Implementation, Documentation and Support of software applications. * Have a good experience working in Agile development environment including Scrum methodology. * Strong analytical skills and ability to understand existing business processes.