• Skip to primary navigation
  • Skip to content
  • Skip to primary sidebar
  • Skip to footer

Essay Writer

Call us at 0113-827-2277
Monday–Saturday: 9AM to 7PM
MY ACCOUNT
English English中文 中文
Sample Essay Order Now
  • Home
  • Services
        • Essays
        • Dissertations
        • Other Dissertation Services
        • Assignments
        • Personal Statements
        • PPT Presentations
        • Proofreading & Editing
        • Reports
        • Model Answers & Exam Revision
  • Guarantees
  • Why Choose Us
    • About Us
    • Our Press
    • Scam Warning
  • Prices
  • Contact Us
  • Order Now
  • Essay Examples
  • My Account
Essay Writer » Sample Essays » Statistics

Statistics

Send to Email

Your name must be between 5 and 60 characters long.

Please provide valid email.

Please process if you are a human.

Successfully Sent.
You have sent this a while ago.

Distribution of Variables

In the context of ordinary least squares (OLS) regression analysis, the key variable is the regressand; in this case, ‘index’. For hypothesis testing to be valid, the error term in the model must be independently, identically normally distributed. With the regressors assumed to be constants from sample to sample, ‘index’ should also, therefore, be normally distributed. Two points might be made at this stage. First, there is no assumption regarding the distribution of the independent variables in the analysis, as should be clear from the frequent use of dummy variables. Second, the normality assumption is only required for the purposes of hypothesis testing. The mathematical estimation of the OLS coefficients can be undertaken whether this is true or not.
Figures 1 and 2 present, respectively, the histogram and detrended Q-Q plot for the ‘index’ variable, with both suggesting quite strong deviations from normality. However, this visual impression does not find a good deal of support from statistical measures that are typically used to evaluate departures from normality. Thus, as shown in Table 1, the median of ‘index’ is very close to its mean, as is the 5% trimmed mean (the average of the data when the highest and lowest 5% of data values are excluded). Both of these results are consistent with a normal distribution. Furthermore, while both the skewness and kurtosis statistics are negative – the former suggesting a bunching of data values at the lower end of the distribution and the latter a rather flat distribution – neither is large in relation to its standard error.
Somewhat more formally, Table 2 reports the Kolmogorov-Smirnov test, with Lillefors correction applied to render it more conservative, which compares the cumulative distribution function of ‘index’ with that a comparable normal variable. As this statistic is not significantly different from zero, the assumption of normality cannot be rejected. Likewise, the Shapiro-Wilks test, which is a correlation between the ‘index’ scores and their corresponding normal scores, does not depart from unity at any conventional significance level: a result that, once again, suggests a normal distribution. Nevertheless, it is hard to accept that a variable with a gap in its distribution and possessing multiple peaks is distributed normally and therefore it was decided to investigate the distribution of the natural logarithm of it, as suggested in the project brief.
As is to be expected with a monotonic transformation of the original data, the switch to the logarithm of ‘index’ produced no significant alterations to the conclusions reached above regarding the distribution of that variable. As such, the results of SPSS exploration of the properties of ‘lnindex’ are reproduced as Appendix A without further comment.
As noted above, there is no requirement that the exogenous variables in an OLS regression conform to any particular distribution. Nevertheless, Appendix B presents data summaries and normality tests for all of the continuous variables contained in the current data set, including those discussed above, along with summaries of the qualitative measures contained within it. It might be noted that, as ‘profit margin’ and ‘ROCE’ contain negative values, they could not be logged without dropping the affected cases from the sample. Likewise, logs cannot be taken for the two qualitative variables that take zero values. Therefore, it could not, at this stage, be considered appropriate to undertake manipulations of the potential regressors in the model to be constructed below.

Relationships between variables

The scatter plots option in SPSS was used to explore the existence of any simple relationships between ‘index’ and the other continuous variables in the data set (excluding ‘actual’ and ‘max’) that might help to inform the choice of regression model specification. The results are reported in Appendix C. In general, this simple exercise failed to uncover any strong relationships, although it might be possible to argue that ‘age’ and ‘ROCE’ are weakly negatively related to ‘index’. Rather than presenting scatter plots for all remaining variables that are potential candidates for inclusion as regressors (a total of 21 additional diagrams), attention turns to the correlation matrix of all continuous variables. This is not only more compact, it will also highlight any simple linear relationships between regressors that would cause OLS estimation to fail, to a greater or lesser extent, as a result of multicollinearity.
The correlation matrix provided as Table 3 confirms the existence of only very weak simple linear associations of ‘index’ with the other continuous variables in the data set, although it must be recalled that the model to be estimated below will be a multivariate construct and the matrix could be masking more complex relationships. However, it does provide a preliminary warning that OLS will almost certainly not be able to handle the joint inclusion of ‘sales’ and ‘assets’ in the model. Furthermore, it may also be sensitive to the joint inclusion of either of those variables alongside ‘capital’. This will be examined below.

Regression models

Notwithstanding the earlier precautionary remarks about collinearity between potential regressors, the base model examined here is of the form
In this formulation, the variables are taken directly from the data file, with two exceptions. First, ‘Manuf’ is a dummy variable taking the value 1 if the firm is in the manufacturing sector and 0 otherwise. Second, ‘Other’ is another constructed dummy variable taking the value 1 if the firm is classified as in the ‘other’ sector and 0 otherwise. This leaves the conglomerate firms as the base reference group. It might also be noted that ε is the residual error term.
On the face of it, the catch-all model might appear to work reasonably well. Thus, as reported in Table 4, it has a corrected R2 of 0.382, which is quite reasonable for a cross-section regression. However, this is achieved with only two significant variables in the model. The first of these, ‘listing’, attracts the positive sign that one would expect, given the regulatory regimes that tend to prevail on the stock exchanges of developed countries. The second is the negative coefficient for the variable ‘other’. As this is a dummy shift term reflecting the disclosure practices of such firms relative to those of conglomerate companies, it again might have been anticipated. However, the findings should be treated with caution because, as predicted above, the model is plagued by multicollinearity. This is revealed by the size of the variance inflation factors (VIFs) – the reciprocal of (1 – R2) in an auxiliary regression of one regressor on all of the others – for the ‘sales’ and ‘assets’ variables in the model. As these are essentially measuring the same thing in statistical terms, one needs to be eliminated. As the VIF is highest for ‘sales’, the model was re-estimated with it excluded.
As shown in Table 5, the explanatory power of the amended specification remains essentially unchanged when compared with the full model. Furthermore, there are no signs of gross instability in the qualitative impacts of the remaining included regressors: only ‘audit’ changed sign, but it was wholly insignificant in both formulations. However, it was still the case that only ‘listing’ and ‘Other’ achieved statistical significance. Recalling the high correlation coefficient between ‘assets’ and ‘capital’ – the two remaining regressors with the highest VIFs – it was decided to re-estimate the model again, excluding the latter of these variables.
The results of the re-specification are given in Table 6. While the explanatory power of the model fell, the deterioration was only marginal. Furthermore, ‘assets’ joined the list of significant variables, although its negative coefficient is contrary to what might intuitively be expected. Once again, there were no serious signs of instability in the estimates and all VIFs lay below two. However, cross-sectional estimations can suffer from heteroscedasticity; that is, from a non-constant variance of the disturbance term across observations. This possibility was checked by a cross-plot of the model’s standardised predicted values against its standardised residuals. The outcome, which is presented in Figure 3, exhibited a fairly random distribution and leads to the tentative conclusion that variance misspecification, is not a serious concern. Rather than attempting to examine further possible variants of the model through a fishing exercise to check this conclusion more thoroughly, recourse to the variable selection routines available in the SPSS regression facility.
Reassuringly, the stepwise, backward and forward options all generated the same result: a model incorporating ‘capital’, ‘listing’, ‘other’ and a constant. As such, only the estimates from the stepwise exercise are reproduced as Table 7. It should be noted that the coefficient estimates for the retained variables are very similar to those reported above for the fuller model, which is a reassuring indication of stability. The major change to highlight is that ‘capital’ rather than ‘assets’ was finally retained. Taken at face value, the findings suggest that stock exchange listing increases disclosure, while market capitalisation tends to decrease it. In addition, firms that are neither in manufacturing nor are conglomerates tend to have inferior disclosure practices.

Potential problems and their resolution

There are a number of possible caveats to the foregoing exercise. The first is quite simply that it was undertaken without a justified theoretical underpinning and it is therefore difficult to consider questions of misspecification in other than a purely statistical sense. Second, some of the regressors, in particular perhaps ‘sales’, ‘profit margin’, ‘ROCE’ and ‘current’ are potentially volatile over time and might better be replaced with their averages over some backward looking horizon, say of five or ten years. Third, there is no real justification for assuming that a linear specification is the appropriate one to adopt. Fourth, there is an evident lack of variability in the dependent variable ‘index’, which has a very low coefficient of variation (standard deviation divided by the mean) of 2.5 per cent. One possible way to overcome this problem might be to increase the sample size, whether by further Greek companies or by international comparators. Finally, while the apparent errors are generally small, ‘index’ is not measured as simply ‘actual’/’max’ and this may be a cause of at least some of the problems encountered.

Send to Email

Reader Interactions

Primary Sidebar

Search

Subjects

HARVARD REFERENCING STYLE

Avoid getting accused of plagiarism by citing your sources correctly.

Read More

Is Using EssayWriter Cheating?

Consulting or receiving assistance from Essay Writer in completing your essay writing tasks is most definitely NOT cheating.

Read More
Statistics Essay Samples

  • Statistics
  • Time Series Properties of Stock Returns, Comparative Analysis of High- and Low-Frequency Data
  • THE DETERMINANTS OF HOUSE PRICES IN THE TWO-TIER HOUSING MARKET IN GUERNSEY
  • Regressions and Econometric Results
  • Assessment of co-trimoxazole as a treatment for of patients with HIV and Tuberculosis co-infection in Luciarna

Ready to place your order?

ORDER NOW

Footer

About EssayWriter

about essaywriter

 

We immensely take pride for being the original and premier academic essay writing provider. We are constantly looking for new ways …
Read more

Resources

resources

 

  • Harvard Referencing Generator
  • Tell A Friend
  • Is Using EssayWriter Cheating?
  • Home
  • Services
  • Guarantees
  • Prices
  • Order Now
  • Contact Us
  • Terms and Conditions
  • FAQs

© 2025 · Essaywriter

We use cookies to give you the best online experience. Please let us know if you accept all these cookies. Cookies PolicyACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.