Data Preprocessing in Python : Importance

[ad_1]

Data preprocessing is the first steps in any Machine Learning or predictive analytics . Before you start reading this article , I would like to inform you that This article is exclusively  for Python developer / data scientist beginners and aspirants . This article – Data Preprocessing in Python : Importance covers the journey from beginner to advance level learner . Now lets start –

Data Preprocessing in Python : Importance

There are several reason why we perform data preprocessing

  1. Most of Machine Learning performance get slow down if feature (data ) are not scale . Lets understand suppose you have two features one is in the scale between ( 0-2 ) and other ( 0-1000000) . Now if you are performing regression on the top of it . There will be so many iteration of adjustment in the value of regression coefficient  in order to achieve  accurate prediction . This phenomena will increase the time in training data set .But if you scale them uniformly it would be performance oriented .
  2. We should always remove unexpected values from data set . For example random forest algorithm do not support null values . So replacing such values with some significant sort of values is also under data preprocessing .
  3. Data set should be in the condition where we can easily changes the underline machine leaning algorithm over it .Here preprocessing principal convert them in compatible format .
  4. We have to convert the categorical data into numeric one . As you know , all machine learning underline work on numeric data ( not on text ).

I think these are enough reason for you to read and hands on Data Preprocessing in Python .There are several others but these were major .

Domain data is a bottle neck ? –

Domain data is something which may create problem in preprocessing . Please do not follow the predefined or  usually defined preprocessing lifecycle with domain data . Domain data is something where you have to understand which technique can help you the most . usually the null value is either dropped or replaced but in domain application it may help you as well . It is just awareness check for you regarding your data .

Conclusion –

The scope of  this article was to introduce you with the importance of preprocessing . I have seen team usually invest lot of time in finding best machine learning algorithms . They try varies combination of machine learning models . Still they never get good accuracy . See Data Science is more on the data and  less is algo . We usually ignore this . If It is all about the algorithms we all are not scientist . The scientist tag is just due to we are there to identify pattern in data . We shape the data . We also ensure that the algorithms must get proper data .And you know its all about preprocessing . I always encourage to give at least  25 % time in understanding , cleaning and shaping data .

I hope ! This article will be a motivator for you in preprocessing . If you want to share your own story of preprocessing . You may describe that how preprocessing change your evaluation matrix board . We love to hear back from our readers . In fact we love to be the audience of our audience .

Thanks

Data Science Learner Team 

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.



[ad_2]

Source link

Levels of Measurements Basics for Every Data Scientist & Statistician

[ad_1]

In the journey of statistics learning as a data scientist or statistician , It is really essential to understand the concept of Levels of Measurement . All the statistics is revolving around random variable and its distribution . But before understanding random variable distribution nature it is really essential to know how it is measured . This article ,”Levels of Measurement’s Basics for Every Data Scientist/ Statistician is full of such information . So Lets understand this part together .

Levels of Measurement-

There are four ways by which you can measure your variable . Each one has its own importance . I know you want to know their name quickly . Here are 4 Levels of Measurements –

  1. Nominal Level
  2. Ordinal Level
  3. Interval Level
  4. Ratio Level

Nominal Level

You can say nominal level as the categorical level of measurement. It has no numerical value and generally uses for classification of the data variable into categories. In addition, these variables cannot be sorted and are mutually exclusive with other variables. You can use it for labeling the variables in the dataset.

For example, Color is a nominal level of measurement. Red, Blue, green, e.t.c are part of the color category. Some of the other examples are:

Sex – Male or Female

Shoes – Casual, Running, sports, etc.

Country – India, USA, UK

Ordinal Level

In the Ordinal Level of measurement, the variables can be orders and classified. It has non-numerical values and you will find a relationship among them. However, It can be ordered but lack any scale.

You will find in many survey questions, the ordinal level of measurement is used. For example, Satisfied, Unsatisfied, neutral, et.c is not numerical and can be ordered. You can say those variables that have no equivalent boundaries or distance between them, they are measured with the Ordinal level of measurement.

Interval Level

With this level of measurement, data variable can be ordered into classified categories. In addition, it has no zero points. It means you can add or subtract two-interval level. But you can not multiply and divide it. It can also have negative values. You can find the difference between the two intervals that are the distance between the two intervals.

The common example of the interval level is the measurement of temperature. The difference between the 40 and 30-degree Celsius is the same as the difference between 70 and 60-degree Celsius that is 10.

Ratio Level

You can consider this level as the father of all the above-described levels. It has all the characteristics of Nominal, Ordinal and Interval Level of measurements. In addition, it has no zero points. It means zero means real zero value no arbitrary zero. And also you can add, subtract, multiply, and divide it, there will be no change in ratio level.

For example, if the price of a product is 0. then it means that the product is zero value that is free. Another best example of ratio level of measurement is height and weight. If the weight f a person is zero then it means weight of that person is zero.

 Conclusion –

Almost every body who is reading this article , must priorly know about this (Levels of Measurements) . The idea is to give you knowledge about terminology (Levels of Measurements) . Because Going forward whether you read any article in Data Science Learner or any other place , You will get these term . Actually, it is standard practice. This is the basic building blocks .

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.



[ad_2]

Source link