May 2016 | Page 3 of 4 | AcadGild Blog

20 May 2016

How Hadoop is Used in Organizations

“There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days.” These words were quoted by Ex Google CEO Eric Schmidt in 2010. We must understand the background of his statements. In the past decades the...

Leave a Comment

18 May 2016

MapReduce Custom Partitioner

In this post, we will be looking at how the custom partitioner in MapReduce Hadoop works. This post will give you a good idea of how a user can split reducer into multiple parts (sub-reducers) and store the particular group results in the split reducers via custom partitioner. Before...

Leave a Comment

17 May 2016

Querying Hive Using Apache Drill

Apache Drill is an open source software framework which has been derived from Google’s Dremel System available as an infrastructure service called Google BigQuery. The specilaity of the Drill is to scale up to 10,000 servers or more and to be able to process petabytes of data and trillions...

2 Comments

16 May 2016

Spark Use Case – Uber Data Analysis

In this post, we will be performing analysis on the Uber dataset in Apache spark using Scala. The Uber dataset consists of 4 columns. They are dispatching_base_number, date, active_vehicles and trips. You can download the dataset from the below link: https://drive.google.com/open?id=0ByJLBTmJojjzS2c2UktqLW5uRG8 Problem Statement: Find the days on which each basement...

1 Comment

14 May 2016

Why Learning MongoDB Will Boost Your Career

There is an increasing demand for NoSQL database experts, especially those highly trained on MongoDB. This is mainly due to the fact that NoSQL databases are replacing traditional RDBMs as NoSQL databases like MongoDB offers the flexibility to organize data in a way that makes the data usable. This along...

2 Comments

13 May 2016

Job Responsibilities of Hadoop Professionals

It is the age of Big Data and Hadoop and countless professionals have made it their dream career. After all, who doesn’t want to be a part of the most happening thing in the IT sector. We have been receiving so many queries regarding the job opportunities in Hadoop...

Leave a Comment

13 May 2016

Graphical Exploratory Data Analysis-II

In the previous blog, we discussed about using histograms to check the central tendency measures of a continuous variable. We also plotted the frequencies of different ordinal variables to see the distribution of observations in each category. In this blog, we will discuss about other types of data visualizations,...

Leave a Comment

12 May 2016

Integrating SparkSQL with MySQL

In this post, we will be learning how to connect to a JDBC data-source using SparkSQL data frames. In case you are not familiar with SparkSQL, you can refer to this post for a comprehensive Introduction to SparkSQL and the post on Analyzing Crime Data using SparkSQL. We know...

Leave a Comment

11 May 2016

Skewed Join in Pig

In our previous blogs we discussed about Replicated Join and Merge Join in Pig. In this post we will be continuing our discussion by implementing skewed joins. Skewed join can be implemented if user’s underlying data is sufficiently skewed and the control needs to be given to user over...

Leave a Comment

08 May 2016

Spark Use Case – Travel Data Analysis

In this blog, we will discuss on the analysis of travel dataset and gain insights from the dataset using Apache Spark. The travel dataset is publically available and the contents are detailed under the heading, ‘Travel Sector Dataset Description’. Based on the data, we will find the top 20...

Leave a Comment

AcadGild

Month: May 2016

How Hadoop is Used in Organizations

MapReduce Custom Partitioner

Querying Hive Using Apache Drill

Spark Use Case – Uber Data Analysis

Why Learning MongoDB Will Boost Your Career

Job Responsibilities of Hadoop Professionals

Graphical Exploratory Data Analysis-II

Integrating SparkSQL with MySQL

Skewed Join in Pig

Spark Use Case – Travel Data Analysis

Big Data and Hadoop Developer 2016 | Big Data as Career Path | Introduction to Big Data and Hadoop