1

ChurnData Analysis

Institute

ChurnData Analysis

33.Exploring whether there are missing values for any of the variables

Thereare no missing values for any of the variables using a formula thatchecks for blank data in excel, the returned value was 0 as seen inthe figure below.

Figure1: Assessing any missing values for the giving variables

34.Comparing the area code and state fields. Discussing any apparentabnormalities.

Thereis an abnormality between the area code and sate field as statefields keep on changing, there are a consistent and specific numberof area codes which keep on appearing almost exactly after iteratingthrough the array of number, the area code does not change from thesearea codes given below

408 |

510 |

415 |

415 |

408 |

510 |

510 |

415 |

415 |

408 |

415 |

415 |

510 |

415 |

35.Using a graph to visually determine whether there are any outliersamong the number of calls to customer service.

Outliersindicate distant points from the rest of the data, which may be dueto variability during taking of measurements or indicate experimentalerror outliers can occur be observed by chance in any kind ofdistribution, but usually indicate error or a data population that isheavy-tailed. As can be seen in figure 2

Figure2:States Against number of calls to customer care

Thereare outliers in the number of calls as can be seen the highest numberof calls made to customer care range heavily between 0 to 4 calls perstate outliers lie at a range between 8 to 10 calls.

36.Identify the range of customer service calls that should beconsidered outliers, using:The Z-score method, and

TheZ score is also known as the standard score, it is a measure used instatistics, the measurements of the standard deviations to mark thevalues that are above or below a certain mean, the formula forZ-score isz = (x – μ) / σ

Wherez is the Z-score

Xis the value being standardized

μthe data mean

σthe standard deviation of the

Belowis a graph generated using the Z-score formula the range of outlierusing Z-score ranges from 3-5 as the general data is located around aZ-score of -1.5 t0 1.5

Figure3: Z-score for outlier calls to the customer care

TheIQR method.

Usingthe IQR method to identify the range of customer service calls thatshould be considered outliers. To identify the outliers it isimportant to find the statistical center of the given range of data,by finding the first and the third quartile, this is a statisticaldivision of a data set into four equal groups that make up 25% of acollection. By using the first and the third quartiles, it ispossible to calculate the statistical 50% called the IQR(interquartile range) Figure 4 shows this below.

Figure4: The Inter Quartile range

Howfar the middle 50% a value sits and be considered reasonable,Statisticians in general agree that IQR x 1.5 can be used to define areasonable range. By

Calculatingthe Lower fence is obtained by 1^{st}Quartile –IQR x 1.5

Calculatingthe Lower fence is obtained by 1^{st}Quartile + IQR x 1.5

Thisgives the reasonable range similar to the Z-score at -1.5 to 1.5,values falling out of this range cane be considered outliers.

37.Transforming the day minutes attribute using Z-score standardization.

Transformcan also be referred to as standardization, where the Z-score becomescomparable by measure of observations in multiples. To do a transformthere required the mean, the standard deviation and the value to beoperated on, the mean and the standard deviation shall be obtainedfrom values that have been transformed into a Z-score as seen infigure 5.

Figure5: Transform by Z-score

38.Working with skewness as follows.Calculating the skewness ofday minutes. Then calculate the skewness of the Z-Score standardizedday minutes. Comment.Based on the skewness value, would youconsider day minutes to be skewed or nearly perfectly symmetric?

Skewnessranges in negative to positive values characterizing the degree ofasymmetry of data around its mean if the skewness is positive itindicates distribution that is asymmetry extending towards positivevalues, while negative implies an asymmetric tail that extendstowards the negative values.

Theskewness of the Day minutes is -0.029077067,which indicates a slight inclination towards the negative value as itis a negative but a low value.

Theskew value of the Z-score Day minutes is -0.029077067 did not changefrom the normal day minutes.

39.Constructing a normal probability plot of day minutes. Comment on thenormality of the data. 

40.Working with international minutes as follows.Construct anormal probability plot of international minutes.What isstopping this variable from being normally distributed.Constructa flag variable to deal with the situation in (b). Constructa normal probability plot of the derived variable nonzerointernational minutes. Comment on the normality.41. Transformthe night minutes attribute using Z-score standardization. Using agraph, describe the range of the standardized values. 

TheNormal probability test is also known as Test Plot, The chart ifusing a tool like excel .This graph is used to establish whether adata set is normally distributed plotting of the data is done in away that the points form an approximate straight line. this is doneby determining the sample size, creating the table of the data inincreasing order, Using rank to order the list, and calculating thecumulative probabilities, the chart that is obtained from this can beused to establish if a set of data following a normal distribution,the idea is to plotting the actual Z-score on the x-axis against theZ-score obtained form an equivalent normal distribution.

Oneof the characteristics associated with normal distribution is thatdata will have the same curve area at every given point, it ispossible to obtain the amount of curve area between two sample pointsby the use of the cumulative distribution function, this functionprovide the total area under to left region of beneath the curve

Flaggingin excel, is used if the value of that cell matches a long list ofitems, this is used to mark certain values that the meet a certaincriteria, using these values it is possible to develop possible casescenarios, in a scenario that data is required to be manipulated toassess its impact, flagging becomes an important function that beused to deal with certain kind of data that can be handles so as tomake observation of the final output, or make certain conclusion of agiven dataset. In our case the normal distribution and establishwhat may be causing the data not be normally distributed.