Top 4 ways for Feature Scaling in Machine Learning

[ad_1]

Feature Scaling is some thing which really effects the Machine Learning Model in so many ways . I agree there are so many situations where Feature Scaling is optional or not required . Still there are so many Machine Learning Algorithms where Feature Scaling is must have process . For instances – Regression, K-Mean Clustering and PCA are those Machine Learning algorithms where Machine Learning is must to have technique. In the opposite side usually tree based algorithms need not to have Feature Scaling like Decision Tree etc . Today in this tutorial we will explore Top 4 ways for Feature Scaling in Machine Learning .

Feature Scaling in Machine Learning –

There are so many ways to scale the feature or column value . Its completely scenario oriented that which Scaler will be more performance oriented . Lets start exploring them one by one –

  1. Standardization –

This is one of the most use type of scaler in data preprocessing . This is known as z-score . This re distribute the data in such a way that mean =0 and standard deviation =1 . Here is the below formula for calculation –

z-score = [current_value – mean of data(feature)]/standard_deviation

For the implementation , you may use sklearn.preprocessing. StandardScaler

Please refer here for complete documentation on Standard Scaler here .

The another use case of standardization is to remove the outlier from the data set. See once you transform your data set using the standard scaler . All the values which are out from [-3,3] will be consider as outlier in data set / feature .

2. Mean Normalization –

Lets understand the formula first here –

normalization-score = [current_value – mean of data(feature)]/[max(feature)-min(feature)]

The range of normal distribution is [-1,1] with mean =0. We need this feature scaling technique for zero centric data .

If you are interested to read more on this topic specially implementation . Here is the scikit learn implementation of Normalization .

3. Min-Max Scaler Technique –

Specially when you need to transform the feature magnitude in [0,1] range . This Min-Max feature scaling technique is one the best option . Here is the formula  –

= [current_value – min(feature)]/[max(feature)-min(feature)]

The official documentation of its ( Min-Max Scaler ) implementation in scikit-learn  is here .

4.Unit Vector –

This Feature Scaling is very useful when we need to transform the feature value into unit form.

For more information in Feature Scaling Techniques specially to cover the implementation area  , please have a look on the scikit learn official documentation of preprocessing .

Conclusion –

Feature Scaling and related facts are usually creates confusion on data scientist while model development . This article was an effort to solve those issues . As I have already mention Feature Scaling is completely use-case oriented . In the very beginning we have explained where  feature scaling is optional and where is required . But we are planning to create a detail article on this point – When to apply Feature Scaling .

Anyways how did you find this article – Top 4 ways for Feature Scaling in Machine Learning . If you find any difficulty while understanding , Please let us know .If you think you need to add some more information over this topic feature scaling which is currently not here . You may write in the form of  guest posting .

Thanks 

Data Science Learner Team

 

 

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.



[ad_2]

Source link

Best Ways to Learn Probability for Data Science

[ad_1]

I have talked with almost hundred or more over Data Science Learner’s over the topic – Best Ways to Learn Probability for Data Science .As you know every individual is different. Hence ,They have their own strategies as well. Still I found a big confusion over this topic . Specially probability is a big topic to cover .Hence I prepared a combined approach for Best Ways to Learn Probability for Data Science . I am really excited to share with you . This article is completely all about it . I will request you to stay with it till end . As I tried my best to keep it short and interesting for you –

Three Phases to Learn Probability for Data Science

There are actually three phases which you should follow to understand the probability for data science.

  1. Fundamental Concept.
  2. Numerical Exercise /Case Studies.
  3. Programming Approach for Probability.

1. Fundamental Concept –

In this phase, we all need to revise our academics knowledge of probability. As we all are from a mathematics background . Hence we must have read probability in our schools and college. Some of us still remember it but some of us may have forgot . There are few books which may help to finish all of the concepts at a place.

1.An Introduction to Probability Theory and its Applications, Vol 1, 3ed (WSE)

2.Probability: For the Enthusiastic Beginner 1st Edition

2. Numerical Exercise /Case Studies –

There is no replacement of practicals in science and maths. In the same way, this section and followed by the coming section is completely based on practicals and practice.  Here you may choose two strategy to do these exercises and questions.

2.1 – Solve the exercise book of a dedicated book on probability which must be of engineering level . This will give you consolidated approach of all topics at a place .

2.2- Here you can start with your school days text books and solve only the probability and related chapters of it .Trust me it is only seeming you a big task but really it is not .Once you read them in childhoods , Topics created some knowledge dots . But when you revise them . It will connect those dots and help you to understand them in data science context .

3. Programming Approach for Probability-

Python has very strong packages for Maths and scientific analysis . Like scipy and numpy etc .This will help you to perform underline calculation while probability estimation. Apart from this in order to visualize the distribution function , you may use matplotlib and seaborn etc .

Conclusion –

This intent to create this article –Best Ways to Learn Probability for Data Science is to introduce you with all three phases with their importance.T he biggest mistake people commit is to ignore some phases. See all are equally important. I have seen most of the data scientist aspirants start from last ( programming phase). Please do not do that. Giving some attention and revising the basic concepts from books is really important. See in school we learn how to solve any probabilistic problem. Here we will learn how to convert real-world problems into probability formula or equation.

Thanks

Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.



[ad_2]

Source link