• Analytics Educator

Clash of the titans ANN vs XGB

The major dilemma of a salesman happens to guess the price range the customer is looking for to purchase a particular product. It is sometimes considered to be rude to directly ask for a customer's budget. Hence, Analytics Educator is trying to help build a predictive model to predict the total amount that customers are willing to pay.

View More
  • Analytics Educator

Loss curtail of Medical Clinic using Machine Learning

The manager of the Calcutta Medical clinic, Dr. Joyita Sanyal, is in difficulty due to clinic losses. She was just given a promotion, but she is aware that the clinic has had the best staff and has been running really efficiently. She hired a third-party company to audit the finance department so she could feel secure about the financial side of things. The company, however, could find no proof relating to the problem at hand. She needed to gather information by the upcoming board meeting to explain this oddity. She seeks help from Analytics Educator to deep dive into the data to extract insights out of it.

View More
  • Analytics Educator

Identify bad car drivers using Machine Learning Algorithms

Data science has risen to prominence in the last decade due to its capabilities in predictive algorithms. While many business verticals value the benefits of predictive algorithms using Data Science, insurance companies place a lot of importance as data science and predictive algorithms helps them keeps premium low. Data is always been at the core of what insurance companies do analyzing data such as claims, what kind of a vehicle one drives, how many miles do they drive per day among other.

View More
  • Analytics Educator

Pandas Groupby function using Python. Pivot Table in Python.

"This is one of the most useful analytics and preprocessing tools of Pandas. As the name Groupby suggests, it groups your data by something. Normally, you would want to group your data by categorical attributes. If you are familiar with SQL queries, Pandas groupby is almost identical to SQL groupby. For both SQL queries and Pandas queries, grouping your data by itself will not have any added value or any output, unless it is accompanied by an aggregate function. It replicates the Pivot Table function of MS Excel. We can replicate all the functionalities of excel pivot table in python using the groupby function. Categorizing a dataset and applying a function to each group, whether an aggregation or transformation, can be a critical component of a data analysis workflow. After loading, merging, and preparing a dataset, you may need to compute group statistics or possibly pivot tables for reporting or visualization purposes. pandas provides a versatile groupby interface, enabling you to slice, dice, and summarize datasets in a natural way."

View More
  • Analytics Educator

How to get a job in Data Science? What you need to learn?

More than 90% of the people who start learning Data Science have this one common goal in their mind; how to get a job in this field. There is no magic to get a job in data science, but some simple steps to followed religiously which enhance the chance of getting the job significantly. In this article, we will discuss some simple but really effective steps to be followed to achieve the goal.

View More
  • Analytics Educator

How real estate companies identify good properties with ML

Housing.com is one of the leading on-line real estate broker. Different people would advertise on their portal by paying fees, and that's the major revenue for the company. However, lately the company is facing problem of having low volume of sales. Lot of customers are putting advertisements of some properties, which has poor possibility of getting sold. Henceforth we will refer these as junk properties. Due to these junk properties they are getting lesser visitors, hence their Google ranking has also gone down. The other side effect of lower traffic on their website happens to be lower advertisements on their site. The company has hired a data scientist to differentiate between the junk properties and good properties, so that the company can prioritize the good properties over the junk properties. This process will attract more traffic and their revenue will also increase.

View More
  • Analytics Educator

House Price Prediction using Machine Learning & complete EDA

Machine Learning Predicting numerical measures could be extremely valuable for companies that need to plan their strategies in terms of budgets and resources. In most industries, predicting numbers could bring huge business advantages over their competition, and also enable new business scenarios. Born from the statistics discipline, linear regression became one of the most well-known machine learning techniques to perform this kind of task. In data science, linear regression models are used to find and quantify the relationships between causes and effects among different variables. This kind of model can be very useful in different business scenarios where it's needed to make predictions on numerical measures. Imagine being one of the business analysts that works for a large company in New York City. Your company is into real estate business. The company's goal is to create a fully digital application for the customers which will help them to know the reasonable price of any real estate. Thus they will be able to understand whether a particular house is overpriced."

View More
  • Analytics Educator

Price Prediction of Used Car with Machine Learning Algorithm

Price Prediction of Used Cars using Machine Learning Algorithms Different fields of science, economics, engineering, and marketing accumulate and store data primarily in electronic databases. Appropriate and well-established decisions should be made using the data collected. It is practically impossible to make sense of datasets containing more than a handful of data points without the help of computer programs. To be certain of the insights that the collected data provides and to make further decisions, data mining is performed where we go through distinctive analysis processes. Exploratory data analysis is key, and usually the first exercise in data mining. It allows us to visualize data to understand it as well as to create hypotheses for further analysis. The exploratory analysis centers around creating a synopsis of data or insights for the next steps in a data mining project. EDA in data science actually reveals ground truth about the content without making any underlying assumptions. This is the fact that data scientists use this process to actually understand what type of modeling and hypotheses can be created. Key components of exploratory data analysis include summarizing data, statistical analysis, and visualization of data. Python provides expert tools for exploratory analysis, with pandas for summarizing; scipy, along with others, for statistical analysis; and matplotlib and plotly for visualizations. In this Free case study, Analytics Educator will show you how to use Machine Learning algorithm to predict the price of used cars (2nd hand cars) accurately. We will emphasize

View More
  • Analytics Educator

Identification of Customers to take loan

A bank while reviewing its customer base found that they have increased significant number of liability customers (depositors) in comparison to borrowers (asset customers). Now they want to aggressively increase their asset customers by providing loan against their credit card. This will not only make a balance between the categories of their customer base, but also help them to earn an interest rate with better margin. The bank had executed a campaign to provide loan but they were not satisfied since they had a single digit success rate. This time they want significantly a better performance without increasing their campaign budget. Now they have hired a data science company - Analytics Educator, who can guide them to achieve their goals without increasing their cost. Analytics Educator will be using different Machine learning algorithm to solve this problem. It's a very frequently occurring problem of the financial institutions, hence we have taken up this case study to show our readers how a real life prject is done in the corporate world.

View More
  • Analytics Educator

Predicting Insurance Premium using Machine Learning with Python

This is a free case study on data science, where we are going to show how to use different machine learning algorithms like Multiple Linear Regression, Support Vector Machine Regression, Random Forest regression and predict the insurance premium. It's a total step by step tutorial to learn how to use a machine learning algorithm using Python to help a business to take decision. The purposes of this exercise to look into different features to observe their relationship, and plot a multiple linear regression based on several features of individual such as age, body mass index (bmi), gender etc to be used for predicting future medical expenses of individuals that help medical insurance to make decision on charging the premium.

View More
  • Analytics Educator

Retention Study - Figuring Out Which Employees May Quit

"Currently, all companies are facing tremendous pressure to retain their employees. It’s the employees of any company which take it at the top. Hence, it has been the top priority of all the companies to retain their good employees, which proves to be beneficial in the long run. All the companies now started using Machine Learning algorithms to predict which employees are likely to quit. If they can predict before the employee actually resigns then they can take preventive measures to retain the employee. Today we are going to use a case study to show a step by step approach to predict all the employees who are likely to resign. We will be using Python to conduct this study and utilize machine learning algorithms like Logistic Regression and Artificial Neural Network, which is also popularly known as Deep Learning to predict."

View More
  • Analytics Educator

Who do we target for Donations

"Who do we target for Donations We have a dataset of people we approached for doners for our Election campaign We have their education, job, income, ethnicity We know high income earners are better to approach for political donations Let's build a classifier that predicts income levels based on a person's attributes. Used Decision Tree, Random Forest, XGBoost to build and compare the results."

View More
  • Analytics Educator

Bank Customers Retirement Predictions Using Support Vector Machines

Suppose you work as a data scientist at a major bank in NYC and you have been tasked to develop a model that can predict whether a customer is able to retire or not based on his/her features. Features are his/her age and net savings (retirement savings in the U.S.). Here Retire is your dependent variable and Age and Savings will be your independent variables. You thought that applying a machine learning algorithm like Support Vector Machine (SVM) can be of great help to solve the problem. You may also apply other classification algorithms to figure out the accuracy and compare it with SVM though.

View More
  • Analytics Educator

Building a Logistic regression in Python, step by step, using Titanic Data

The famous Titanic datased will be used with machine learning algorithm - binary logistic regression using Python to predict whether the person will survive or not (usually denoted with 1 and 0). Logistic regression is a classification machine learning algorithm which is suitable for binary dependent variable and provides the output as the probability of being 1.

View More
  • Analytics Educator

Breast Cancer Classification Using Support Vector Machines

Here we have a dataset of different patients, showing the different characteristics of their cells which was suspected to be cancerous. After thorough diagnosis, it was determined whether it was Malignant (fatal) or Benign (not so harmful). Now we will be using a machine learning algorithm, Support Vector Machine - a classification technique to classify the cells between Malignant and Benign using Python. Then we will match our predictions with original data to check the model's accuracy.

View More
  • Analytics Educator

T Shirt size prediction using the Machine Learning algorithm K Nearest Neighbor (KNN)

You own an online clothing business and you would like to develop a new app (or in-store) feature in which customers would enter their own height and weight and the system would predict what T-shirt size should they wear. Features are height and weight and output is either L (Large) or S (Small). OK, so the customer, we're going to walk in, let's say, in the store and what are we going to do? We're going to ask the customers to provide us with their weight in kilograms, which is the first feature and the second feature, which is going to be height in centimeters. Right. So these are kind of the inputs to the algorithm. And the algorithm should predict whether we wanted to give him or provide the customer with either size large. Let's say in-store feature where the customer we're going to walk in, provided that with the two features, their weight in kilograms and their height in centimeters. And what we're going to predict for them and get them, you know what we're going to give you either size, small or size large based on it.

View More
  • Analytics Educator
  • 16th May, 2020

R Tutorial for beginners with FREE codes and assignments

R is one of the most powerful and popular programming language among Data Scientists. It is FREE software and lets the analysts perform most complicated analysis without getting into too much of details. It also lets you to automate most of the MIS reporting which is traditionally getting done in MS Excel. R is having the highest growth rate among all the data science software in India.

View More
  • Samrat Chakraborty ,
    Sr. Data Scientist,
    TCS Kolkata
  • 27th April, 2020

Customer Segmentation Using RFM with Python

In Retail & e-Commerce sectors the chain of Supermarkets, Stores & Lots of e-Commerce Channel generating large amount of data on daily basis across all the stores. This wide range of customer’s transaction we need to analyze for making profitable strategies and decision
All customers having their different kind of needs. Increasing customer’s transaction and customer base it is not very much easy to understand the requirement of each customer. Identifying potential customers can improve the marketing campaign, which ultimately increases the sales and generate more cash for business .we are using customer Segmentation for grouping those customers into various segments.

View More
  • Shubhadip Paul,
    Sr. Data Scientist,
    TCS Kolkata
  • 8th March, 2020

Network Analysis with R: In an interactive way

Network analysis is a popular way to visualize the relationship between different nodal points. For example, the average time required to travel from warehouse A to destinations X and Y. We can just say it's 45 and 65 minutes respectively and that is perfectly fine. Now imagine a scenario where there are 19 warehouses and 32 destinations. In that case just representing travel time between warehouse and destinations in above manner or putting it in a tabular format may not be sufficient to get the overall picture. Here the network analysis comes in for help!

View More
  • Analytics Educator
  • 7th March, 2020

Automation using R

In today's scenario, one of the most sought after buzz word has been "Automation". There has been different ways by which different people are trying to automate their processes. It's been observed, oftentimes, either these processes of automation are either complex to implement or expensive. However, in Analytics Educator, we have successfully automated some of the processes using an inexpensive method which is quite simple to implement.

View More
  • Analytics Educator
  • 7th March, 2020

The usual mistakes made by Data Scientist aspirants

Data Scientist has been a buzz word in India, and many job seekers are getting attracted towards this field. However, it's been observed that most of the candidates make certain inadvertent mistakes and are facing lots of hurdles to get a job in data science.

View More
  • Sanjay Fuloria
    Professor,
    IBS Hyderabad,
    Operations Management
    and IT Area
  • 1st March, 2020

A brief overview of Analytics

Analytics is a buzzword which has gained prominence in the last two decades starting from the beginning of the twenty-first century. It is not that analytics did not exist earlier. Merriam Webster dictionary defines "analytics" as "the method of logical analysis". Analytics has been there since the time of Frederick Winslow Taylor who popularized time management exercises. The version of analytics that is more popular now is "Data Analytics".

View More
  • Analytics Educator
  • 1st January, 2020

How to handle missing values using R

In any dataset, most of the times, we face problems of missing data. We can't simply let these missing data be there in the dataset. There are different ways to handle them. First let's take a sample dataset to illustrate an example.

View More