Removing the Outliers, and Performing Normalization


When analyzing data of a company’s sales, it is necessary to obtain an accurate calculation of the average sales for a given period of time. This can usually be done by asking the following question: “How many products did we sell on an average per day during this month or year?”

The answer to this question will provide key information that is critical to forming accurate company forecasts (via removing the outliers). This information is in the form of indicators of the effectiveness of individual sales departments, as well as of the performance of the company as a whole.

There is not much data needed for ordinary calculation of average monthly sales. All that is required is knowledge of the total number of products sold during that month. Divide this number by the number of days in the month, and you have your average.

While this is easy to do, it may not be the right way to perform proper analysis. What if, during the compilation of data on monthly sales, you notice that there are days when sales clearly distinguish themselves? Let’s take an instance when there are atypically large sales (or drawdowns), which are well outside normal turnover rates of a product. For the sake of demonstration, we will refer to the product as boxes of screws sold by a company that deals in metalware:

Deals and amount

The total sale of boxes of screws, for this period, is 141.

On average, you have historically sold four boxes of screws per day to various parties. As you can see, there are days in this time period when you have sold an atypically high number of the product (for example, 10 pieces on the 9th of the month, and 12 on the 14th ). The situation when the volume of sales in a given day is significantly higher than the established average daily sales is referred to as an “outlier.”

Why bother obtaining this information? Because outliers can greatly change the picture of the forecast. In the month after this (that we have used as an example), such large sales may or may not happen. Future forecasts must take this into account. Therefore, we need to determine a “common denominator” so that our predictions are accurate.

At this point, two pieces of information are needed. The first is the specific value we want to use in order to consider a value an outlier. The second is: exactly what we will do with those outliers going forward.

To answer the first question we need to determine a “channel,” or in other words, the value beyond which we will consider there has been an “outlier”. In the example given, four boxes of screws sold, on average, per day. We could use this value, but it would be better to do just a bit more analysis to come up with our outlier number. Rather than using the average, we will use the mean, which is equal to 4.7 pieces/day.

Now we can define the boundaries of the channel. The established practice is to define the boundaries of the channel as the percentage of deviation from the mean. If we set the limit of the channel at 80 percent, the lower limit of the channel (4.7 * 0.2) is 0.94 pieces, and the upper limit (4.7 * 1.8) is 8.46 pieces. This allows us to define the values (the ones outside the boundary of the channel) that will be considered to be outliers. Note that values both above and below the channel are considered to be outliers.

Our data will look something like this (Fig. 1):


Amount graph

After identifying the outliers, we can address the second problem – what to do with them. The answer is to introduce a series for analysis that is the closest to the actual history of our sales. We can then do one of two things. The first is as follows:

  1. Removing the outliers. When applying this method, we project that any atypical values identified are so uncharacteristic of our operations that they are unlikely to ever reoccur. For this reason, we ignore the data on outlier days. There are two likely scenarios that would cause us to ignore outlier data and remove it. One is when we had no remainders in stock and therefore could not sell the product. This resulted in the negative outliers. The other outlier situation occurs, for example, when an unexpected customer shows up and makes a surprisingly large purchase, but we do not expect that this will ever happen again.

To remove the outlier, we check through our series and consider every value that is outside our defined channel to be equal to zero. Our series now looks like this:

Removing the outliers

The sum of the new series is101 boxes of screws. The daily average is 3.37.

  1. Normalization of the series. When applying this method, we assume that any atypical values identified this month are not characteristic of our operations, but we do not have to ignore them, we just need to smooth their effects to the limits of our series, suggesting that similar situations can happen in future.

To remove the outlier, we check through our series and consider every value beyond our defined channel to be equal to the nearest boundary of the channel. Therefore, our series now looks like this:

Normalization of the series

The sum of the new series is 139.54 boxes of screws. The daily average is 4.65.

  1. Recovery the outliers to the normal range. The method is closest to providing the result of the series “expected” by us. It uses a combination of the two methods we have previously described.

First, we bring a series to the “zero-outliers” state by clearing it using method (1). Then, for the new cleared series we obtained, we calculate a new average value (3.37) and a new deviation channel for 80% (from 0.67 to 6.06). After that, we apply method (2) of normalization, but with the values of a new channel for the removed outliers. The series now looks like this:

Recovery the outliers to the normal range

The total for the new series is 128.6 boxes of screws. The average daily sale is 4.29.

Below you can see how the two cleaning methods compare against the initial series:

All methos graph

Mycroft Assistant is flexible. It caters to your needs. You can change specific settings quickly and easily, remove outliers, and produce normalized data. Now, predicting future product consumption accurately is just a click away.

Know more:

— Demand forecasting – information about it you can find HERE

— Sales analysis – information about it you can find HERE ;

— Planning – information about it you can find HERE and HERE ;

If you want to get started with the Mycroft Assistant – where we have implemented everything necessary for you to work effectively, simply sign up HERE and follow the provided instructions.