FAQ

Home
Specifications
Reviews
Uses
Buy
Try
Update
FAQ
Support
StatisticalHelp
Contact


Email your questions about StatsDirect to support@statsdirect.com - some copies of questions and answers will be posted here.
Click on the titles below to expand the details:-

  • Citing
    • How do I cite StatsDirect software in papers?
      StatsDirect Ltd. StatsDirect statistical software. http://www.statsdirect.com. England: StatsDirect Ltd. 2008.

      The theoretical basis of the methods used in StatsDirect should be cited as listed in the reference section of the help system in StatsDirect.

      A doctoral thesis describing the scientific basis of StatsDirect is at http://www.statsdirect.com/thesis/md.pdf
       

  • Linux or Mac
    • Are there Linux or Mac versions of StatsDirect?

      StatsDirect is written only for Microsoft Windows at present.

      Several StatsDirect users run it on a Mac under Soft Windows, but we do not test this for compatibility.

      We might develop a cross-platform version in the future.
       

      The reasons for this choice are:

      1. The vast majority of researchers have access to a computer running Microsoft Windows.

      2. The R&D focus of StatsDirect is concentrated upon the evolution of statistical science and upon the needs of researchers, rather than the support of many different platforms.

      3. Cross-platform software development is not yet a robust opportunity for numerical software. For example, the same algorithm written as a Java applet could give different results on different computers using different Java virtual machines - it might be possible to integrity check sample calculations against known floating point problems in cross-platform work, but we could not reasonably guarantee to have tested all possibilities on all platforms. This position may change in the future (see for example http://math.nist.gov/javanumerics/), and StatsDirect will evolve in whichever way meets the statistical needs of the greatest number of researchers in the safest possible way.
       

  • Problem removing the Excel-Statsdirect link add-in
    • Download and run the sdxlremover.exe file if you need to remove the StatsDirect add-in for Excel.

  • Confidence intervals for proportions
    • Why is does StatsDirect give slightly different confidence intervals to the one's I calculate by hand for proportions?

      Many text books contain poorly performing formulae for approximation of confidence intervals for binomial proportions and differences between them. StatsDirect improves upon these formulae, especially in the presence of small numbers, by using methods that have good coverage properties. For more detailed discussion of the reasons behind the choices of methods used in StatsDirect, please see the following series of excellent papers by Robert Newcombe:

      Newcombe R. Improved confidence intervals for the difference between binomial proportions based on paired data. Statistics in Medicine 1998;17:2635-2650.

      Newcombe R. Interval estimation for the difference between independent proportions. Statistics in Medicine 1998;17:873-890.

      Newcombe R. Two sided confidence intervals for the single proportion: a comparative evaluation of seven methods. Statistics in Medicine 1998;17:857-872.
       

    • What do I do if there is a proportion with zero numerator?

      Example:

      Single proportion

      Total = 30, response = 0 Proportion = 0

      Exact (Clopper-Pearson) 95% confidence interval = 0 to 0.115703

      Approximate (Wilson) 95% mid P confidence interval = 0 to 0.113513

      Observation: "...you mention that the Exact Clopper-Pearson) 95% confidence interval for the case where n=30 and r=0 is 0 to .115703. However, if the probability of getting a case in one trial is .115703, then the probability of getting no cases in 30 trials is (1-.115703)^30 = .025. This looks like the 97.5% confidence interval."...

      Reply from Dr Robert Newcombe (expert in this field): "This is a familiar situation. We think of the standard CI as a two-sided 95% CI or a z=(+/-)1.96 CI. It tries to get 2.5% non-coverage at each end. For a Gaussian variable, this is attained, of course. For the binary case, it isn't, and can't be, because of the discrete nature of the outcome space. The "exact" (Clopper- Pearson) continuity-corrected CI aims to make the minimum coverage 95%, and the maximum right and left non-coverage each 2.5%, and achieves this. In an extreme case, when all cases are +ve or all are -ve, an ambiguity arises: should we keep the overall coverage as 95%, or the one relevant non-coverage as 2.5%? It seems to make sense to go for the latter, in order to achieve continuity of interpretation as the number +ve tends to zero. But some would argue, why not get a shorter interval by using an upper limit of 1 - 0.05**(1/30) = 0.095034 instead of 1 - 0.025**(1/30) = 0.115703? I think this is what StatXact does. I would reply that in fact there is a much more comprehensive way to shorten the C-P interval while keeping the defining property of min CP = 0.95. This was developed by Blyth & Still. It isn't often used, I think this is because it's so untransparent how it works, hence little use in presenting results convincingly.
       

  • Multiple comparisons
    • Should I use multiple comparison tests? Is this a statistical fishing expedition?

      A number of people have asked questions about multiple comparisons. My favoured approach is to design experiments with clearly defined comparisons in a manner that avoid post-hoc 'dredging' and the need for multiple comparison methods. In the real world, however, I favour Tukey-Kramer as a general method or Dunett as a method if multiple contrasts are being made against a control group. Advice from a statistician is important if you are in any doubt.

      In reply to a specific question about Neuman-Keul's test: this is one of the methods of multiple comparison that tries to build in "conservativeness" in order to avoid the type I error that can be associated with dredging your data for differences. It is a controversial area: Peter Armitage gives an excellent discussion of such methods and provides examples in:

      P. Armitage & G. Berry, Statistical Methods in Medical Research, Blackwell 1994.

      see also:

      Miller R. G. (jnr.), Simultaneous Statistical Inference, (2nd edition) Springer-Verlag 1981.

      Hsu J.C., Multiple Comparisons. Chapman and Hall, 1996

       

    • Is it possible to do a Dunnett's or Dunn's type multiple comparison versus control group procedure following finding a positive Kruskal Wallis ANOVA in StatsDirect?

      You can use the all possible contrasts method that is already included with the Kruskal Wallis function in StatsDirect. This has less power to detect a difference between comparison groups and a control group than an hypothetical nonparametric analogue of Dunnett's method, nevertheless, any statistically significant difference detected should be investigated further.  We will look into writing a nonparametric analogue to Dunnett's method [Chris Palmer, cp255@cam.ac.uk wants to be notified when this is done].
       
  • Graphics
    • How can I manipulate chart titles etc.?

      If you click on a graphic in a StatsDirect report then right click and select copy from the popup menu, you will copy it as a Windows metafile. This can then be pasted into Microsoft Word, then you can select the graphic in Word, right click and choose 'Edit Picture' from the popup menu. Do not use drag and drop as this does not work with all versions of Word. If you use the copy and paste method you can edit the graphic as a line drawing in Word. Make sure you have install the graphics converter options for the WMF format when you install Office.
       
  • Diagnostic tests
    • Is there a way of estimating the required sample sizes for a trial which is designed to compare tests (sensitivity/specificity etc), to give desired confidence intervals on estimates?

      Sensitivity and specificity are binomial proportions:

      DISEASE:   Present       Absent

       TEST:+ a (true +ve)  b (false +ve)

            - c (false -ve) d (true -ve)

       Sensitivity = a/(a+c)

       Specificity = d/(b+d)

       So you can use the population survey sample size calculation for the target sensitivity% or specificity% within a specified tolerance and probability of being wrong (i.e. not within that tolerance).

       

    • What is the importance of sub-populations in estimating sensitivity and specificity of a diagnostic test?

      From Sally Hollis {There is a brief discussion of this in the explanation and elaboration document for the STARD statement (see items 23 and 18) web references: STARD initiative http://www.consort-statement.org/stardstatement.htm Explanatory document http://www.clinchem.org/cgi/content/full/49/1/7}
       
  • Meta-analysis
    • How do I calculate power or sample size for meta-analyses?

      In order to calculate the statistical power of a meta-analysis you need a good estimate of pooled variance of the effect size of interest and reasonable assumptions to be made about the effects of inter-study differences in exposure/conditions (heterogeneity) on both the variance and the effect estimate. All of this is non-trivial, and is best handled by a Statistician closely involved with the meta-analysis. It would be possible to add power results to all of the StatsDirect meta-analysis output, but the appropriateness of this needs further debate. Some would argue that under-powered studies should not have been performed, and will therefore question their quality for inclusion in the systematic review. If this approach is taken then pooled power is almost irrelevant.
       
  • ROC curves
    • How do I construct and compare ROC curves?

      You can use the ROC function of the StatsDirect graphics menu to construct and ROC curves and to calculate the area under them with a confidence interval.

      If you wish to compare the area under two or more ROC curves it is best to consult with a statistician. Different methods may be better for different situations (depends on the measurement scale of the outcome).

      Statistical thinking on the use of ROC curves is evolving, here is a reference list:

      Zou KH, Hall WJ, Shapiro D. Smooth non-parametric ROC curves for continuous diagnostic tests. Statistics in Medicine 1997;16:2143-56.

      Here is a reference list for ROC curve comparison:

      Metz's program CLABROC avilable as part of ROCFIT via anoymous ftp from random.bsd.uchicago.edu in /roc/ibmpc

      Altham, P.M.E. (1973) A non-parametric measure of signal discriminability. Brit. J. Math. Statist. Psychol. 26, 1-12.

      Altman & Bland, BMJ vol 309 16July1994 p 188

      Bamber, D. (1975) The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J. Math. Pychol. 12, 387-415.

      Begg, C.B.: Advances in statistical methodology for diagnostic medicine in the 1980's. Statistics in Medicine 10,1887-1895 (1991).

      Bingham, N.H., Goldie, C and Teugels, J.L. (1987) Regular Variation.Cambridge University Press.

      Michael Campbell & David Machin. in Medical Statistics, a common sense approach Section 3.4 (p 40-42) (Wiley)

      Campbell, G. and Ratnaparkhi, M.V. (1993) An application of Lomax distributions in receiver operating characteristic (ROC) curve analysis. Comm. Statist. 22, 1681-1697.

      Delong et al, Biometrics 44 1988, p837-845

      Dorfman, D.D. and Alf, E. Jr. (1968) Maximum likelihood estimation of parameters of signal detection theory -- A direct solution. Psychometrika 33, 117-124.

      Dorfman, D.D. and Alf, E. Jr. (1969) Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals -Rating method data. J. Math. Psychol. 6, 487-496.

      England, W. L. (1988) An exponential model used for optimal threshold selection in ROC curves. Med. Dec. Making 8, 120-131.

      Feller, W. (1971) An Introduction to Probability Theory and its Applications, Wiley.

      Goddard, M.J. and Hinberg, I. (1990) Receiver operating characteristiic (ROC) curves and non-normal data: an empirical study. Stat. Med 9,325-337.

      Green, D.M. and Swets, J.A. (1966) Signal Detection Theory and Psychophysics, Wiley.

      Hanley, J.A. and McNeil, B.J. (1982) The meaning and use of the area under the receiver operating characteristic (ROC) curve. Radiology 143, 29-36.

      J Hanley and B McNeil, Maximum attainable discrimination and the utilization of radiologic examinations, Journal of Chronic Disease, 1982;35:601-611

      Hanley, J.A. and McNeil, B.J. (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148, 839-843.

      Hanley, J.A.: Receiver operating characteristic methodology: the state of the art. CRC Critical Reviews in Diagnostic Imaging 29, 307-335 (1989).

      Karamata, J. (1930) Sur un mode de croissance reguliere des functions. Mathematica (cluj) 4, 38-53.

      Karamata, J. (1933) Sur un mode de croissance reguliere. Theoremes fondamenteaux. Bull. Soc. Math. France 61, 55-62.

      H C Kraemer Evaluating medical tests. Objective and quantiatrive guidelines (1992) Sage publications, Beverly Hills

      Luce, R.D. (1959) Individual Choice Behaviour Wiley: New York.

      McCullagh, P. (1980) regression models for ordinal data (with discussion). J. Roy.Statist. Soc. B 42, 109-142.

      Metz, C.E. and Kronman, H.B. (1980) Statistical significance tests for binomal ROC curves. J. Math. Psychol. 22, 218-243.

      Charles E Metz, Basic principles of ROC analysis, Seminars in Nuclear Medicine, Vol 8, No.4, 1978, 283-298.

      Metz CE. ROC Methodology in Radiologic Imaging. Invest.Radiol. 1986;21:720-723.

      Moise, A., Clement, B., Ducimetiere, P. and Bourassa, M.G. (1985) Comparison of receiver operating curves derived from the same population: a bootstrapping approach. Comp. Biom. Res. 18, 218-243.

      Moise, A., Clement, B. and Raissas, M. (1988) A test for crossing receiver operating characteristic (ROC) curves. Comm. Statist. 17, 1985-2003.

      Moses, L.E., Shapiro, D. and Littenberg, B. (1993) Combining independent studies of diagnostic test into a summary roc curve: data-analytic approaches and some additional considerations. Stat. Med. 12, 1293-1316.

      Mossman, D., Resampling techniques in the analysis of non-normal ROC data Medical Decision Making 1995 15: 358-366

      Rice, S.O. (1944) Mathematical analysis of random noise. B. Sys. Tech. J. 23, 282-332.

      D.Sackett, R. Haynes, G. Guyatt, P. Tugwell. Clinical Epidemiology, Little, Brown @ Company, 1991 pp. 113-119

      Shimizu, R. (1962) Characterization of the normal distribution, II. Ann. Inst. Statis. Math. Tokyo 14, 173-178.

      Simpson, A.J. and Fitter, M.J. (1973) What is the best index of detectability? Psychol. Bull. 80, 481-488.

      P Strike (1996) Measurement in laboratory medicine Butterworth-Heineman, Oxford (this includes a PC disk containing simple-to-use software)

      Svensson and Holm, Stats in Medicine 1994 13, 2437-2453 (Separation of systematic and random differences in ordinal rating scales)

      Swets JA ROC analysis applied to the evaluation of medical imaging techniques. Invest. Radiol. vol14 109-121 1979

      Swets,JA, Pickett RM.Evaluation of Diagnostic Systems. Methods from signal detection theory. Academic Press, 1982.

      Taylor I, Mullee MA and Campbell MJ. Prognostic index for the development of liver metastases in patients with colorectal cancer Br J Surg 1990 vol 77 P499-501

      Thomas, E.C. and Myers, J.L. (1972) Implications of latency data for threshold and non-threshold models of signal detection. J. Math. Pychol. 9, 253-285.

      Thompson, M.L. and Zucchini, W. (1989) On the statistical analysis of ROC curves. Stat. Med. 8, 1277-1290.

      Tosteson, A.N. and Begg, C.B. (1988) A general regression methodology for ROC curve estimation. Med. Dec. Making 8, 205-215.

      Vardi, Y. (1982) Non-parametric estimation in the presence of length bias. Ann. Statist, 10, 616-620.

      Wieand, S., Gail, M.H., James, B.R. and James, K.L. (1989) A family of non-parametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika 76, 585-592.

      Zweig and Campbell, "ROC plots: a fundamental evaluation tool in clinical medicine" Clin. Chem. vol 39 no. 4 1993, p561-577
       

  • Silent installation
    • Use the command setup.exe /s /v/qn to perform a silent (unattended) installation. Be careful to leave a space between the /s and /v but no space between the /v and /qn.
  • Network installations
    • If I have a single user licence, may I install StatsDirect on different computers?

      Yes - the licence is per user and not per computer.

      You can install on a second computer via http://www.statsdirect.com/update.htm and by using the username/email and licence key information that you were originally sent.
       

    • Is there any network support?

      Here are some notes for Network Managers:

      Ideally, StatsDirect should be deployed so that it installs on the client's local disk drive. Systems such as SMS make this easier to do centrally. Script is supplied at http://www.statsdirect.com/download/setupstatsdirect.sms

      Please email support@statsdirect.com for information on setting up batch files to run StatsDirect installations for clients on your network.

      If StatsDirect is installed on one network drive then the administrator should:

      1. Run the StatsDirect application after installation and enter the licence key.

      2. Edit the StatsDirect.ini file in the application directory so that it contains:{

      [Paths] User=d:\user

      [Excel] Linked=0

      ... d:\user is an example of a path for which the end user has file read/write permissions and permission to create subdirectories. If this specified user path does not exist then StatsDirect will attempt to create it.

      Say that StatsDirect has been installed in q:\program files\statsdirect\ and the user's file store is at d:\user, and the network administrator has put an entry user=d:\user into the [Paths] section of the statsdirect.ini file. When the user first runs StatsDirect, the following files are copied from the network drive to the user's file store:

      statsdirect.sdw
      statsdirect.ini
      \data\test.sdw

      ...all of these files need read and write permissions, so if the q: drive was read-only for the user then StatsDirect would not work properly if run from q: without setting the user= entry in the ini file.}
       

  • Repeated measures at different times
    • Q: Please can you advise me on the best way to analyse repeated measurements performed on different patients, which differ only by time? I understand that measurements that differ by time will be serially correlated, so a 2-way ANOVA is inappropriate. What is the best way to analyse such data and can it be done in StatsDirect?

      A: This is a difficult area for which you should seek expert statistical guidance. You may need specialist software for modelling, for example http://tigger.uic.edu/~hedeker/mix.html (best driven by the Statistician involved).

      Some useful references are:

      Ware JH. Linear models for the analysis of longitudinal studies. The American Statistician 1985;39:95-101.

      Davis CS. A computer program for non-parametric analysis of incomplete repeated measures for two samples. Computer Methods and Programs in Biomedicine 1994;42:39-52.

      David CS, Hall DB. A computer program for the regression analysis of ordered categorical repeated measurements. Computer Methods and Programs in Biomedicine 1995;51:153-169.

       

  • Precision, truncation and rounding
    • Will I get rounding errors in StatsDirect like I do in Excel (e.g. mod(11.2932, 0.3137) = 1.33227E-15 when it should be equal to zero since 11.2932 = 36*0.3137)?

      StatsDirect uses very precise calculation methods in order to keep calculation error to a minimum.

      In previous discussion archives, Dr Barry Tennison gave the following easy to understand explanation of rounding error:

      "Inside all (normal) computers, numbers are represented in binary (in various forms like fixed point bor floating point). Since one cannot include an infinite number of bits (binary digits) after the decimal point, the only numbers that are represented exactly are those that can be expressed as a fraction with denominator a power of two (just as the only terminating decimals are those expressible as a fraction with denominator a power of ten). For example one third (1/3) cannot be expressed as a terminating (finite) decimal or binary number. Therefore the INTERNAL forms of numbers like this represent the intended (exact) numbers only approximately. The apparent rounding errors result from this, rather than from any inaccuracy of calculation.


      The following is a more detailed introduction to this subject:

      Numerical precision and error

      "Although this may seem a paradox, all exact science is dominated by the idea of approximation."

      Russell, Bertrand (1872-1970)

      Numbers with fractional parts (real/floating-point as opposed to integer/fixed-point numbers) cannot all be fully represented in binary computers because computers cannot hold an infinite number of bits (binary digits) after the decimal point.  The only real numbers that are represented exactly are those that can be expressed as a fraction with denominator that is a power of two (e.g. 0.25); just as the only terminating (finite) decimals are those expressible as a fraction with denominator that is a power of ten (e.g. 0.1).  Many real numbers, one third for example, cannot be expressed as a terminating decimal or binary number.  Binary computers therefore represent many real numbers in approximate form only, the global standard for doing this is IEEE Standard Floating-Point Representation (IEEE, 1985).

      Numerical algorithms written in Microsoft Visual Basic, Microsoft Visual C++ and Compaq Visual FORTRAN comply with both single (32 bit) and double (64 bit) precision IEEE Standard Floating-Point Representation (Microsoft Corporation, 1998; Compaq Corporation, 2000; IEEE, 1985).  All real numbers in StatsDirect are handled in double precision.

      Arithmetic among floating point numbers is subject to error.  The smallest floating point number which, when added to 1.0, produces a floating-point number different to 1.0 is termed the machine accuracy em (Press et al., 1992).  In IEEE double precision em is approximately 2.22 ´ 10-16.  Most arithmetic operations among floating point numbers produce a so-called round-off error of at least em.  Some round-off errors are characteristically large, for example the subtraction of two almost equal numbers.  Round-off errors in a series of arithmetic operations seldom occur randomly up and down.  Large round-off error at the beginning of a series of calculations can become magnified such that the result of the series is substantially imprecise, a condition known as instability.  Algorithms in StatsDirect were assessed for likely causes of instability and common stabilising techniques, such as leaving division as late as possible in calculations, were employed.

      Another error inherent to numerical algorithms is the error associated with approximation of functions; this is termed truncation error (Press et al., 1992).  For example, integration is usually performed by calculating a function at a large discrete number of points, the difference between the solution obtained in this practical manner and the true solution obtained by considering every possible point is the truncation error.  Most of the literature on numerical algorithms is concerned with minimisation of truncation error.  For each function approximation in StatsDirect, the most precise algorithms practicable were written in the light of relevant, current literature.

      References:

      IEEE Standard for Binary Floating Point Numbers, ANSI/IEEE Std 754. New York: Institute of Electrical and Electronics Engineers (IEEE) 1985.

      Press WH, et al. Numerical Recipes, The Art of Scientific Computing (2nd edition). Cambridge University Press 1992.

      Microsoft Corporation. Microsoft Visual Studio. 1998: www.microsoft.com.

      Compaq Corporation. Compaq Visual FORTRAN. 2000: www.digital.com/fortran.
       

  • Null, zero and missing data
    • What is the difference between a null and a zero value?

      A null value is a missing or excluded observation, recorded as a gap in a worksheet or as the asterisk * symbol. The internal code for a missing value is 3E+300, which would also be treated as missing/null if you entered it as an observed value. Zero must always be entered as 0 or 0.0 or 0.0e0 in order for it to be treated as an observation of zero - but remember that in categorical analysis, such as counts in a contingency table, some researchers may refer to zero as null response.
       
  • Ratios
    • Q: What is the best way of correlating 2 ratios?
      I am postulating that low carotid bifurcations have a longer length of disease beyond the bifurcation, and have measured and created 2 ratios (a bifurcation ratio -length from clavicle to bifurcation divided by total carotid length; and a disease ratio (length from bifurcation to end of disease divided by length from clavicle to end of disease). Is it valid to use simple linear regression/correlation with bifurcation ratio as the independent variable and disease ratio as the dependent variable?

      A: Ratio measurement scales possess all properties of interval scales plus an absolute zero point. You might want to look at limits of agreement rather than correlation.
       
  • Excel
    • If when you select Tools-Add-ins from the Excel menu you get the error message "Microsoft Excel can't run this add-in. Microsoft Excel cannot install the necessary files due to Windows Installer error 1605..."

      Go to Start_Run and type regedit and press enter to start the registry editor - then select Edit_Find and search for mpds.xla - if this is present under HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Installer\Components then delete the mpds.xla value. Be very careful when editing the registry that you do delete something accidentally as this can cause serious harm to an installation of Windows.
       
    • Removing add-in

      There is a tool for removing persistent links from StatsDirect to Excel at: http://www.statsdirect.com/download/support/linkremover.zip
       
  • Feature requests
    • Multiple single proportions analysed at once (mainly confidence intervals)

      Done: see meta-analysis of proportions
       
    • Command line scripting

      Due in StatsDirect 3
       
    • Weighted crosstabs

      Considering
       
    • Prediction from logistic regression

      Q: Is there a possibility that a facility for creating a column in the worksheet where the logistic "equation" can be applied to all rows, thus giving the probability of a poor outcome for the binary dependant variable for each row? I realize that this can be done manually by using the function tool, but can this be incorporated in a future update?

      A: Due soon
       
    • Random effects and GEE

      A complex area that we do not want to over-simplify.  As a result we are planning an expression builder and export function to R, with help prompts to frame the problem for consultation with a statistician.
       
    • Complex generalised regression and ANOVA expressions/designs

      The basic components for user-defined GLM are already in StatsDirect. We debated adding a regression expression parsing interface and decided that this was more the domain of statistical modelling software designed for exploratory work of statisticians. Some of us were concerned that the focus of StatsDirect upon supporting the statistical appreciation and practice of the non-Statistician would be lost by leading its users into areas where support needs would be difficult to predict. Instead we advise people to do GLM with a Statistician using their preferred modelling software, e.g. R, S Plus, SAS, Stata.
       
    • Vertical box and whisker plots

      Q: Could there be an option to plot box and whisker plots vertically instead of horizontally.

      A: Horizontal makes labels easy to read, but a vertical option is planned.
       
    • Comparison of k independent proportions

      Q: In StatsDirect under analysis_proportions, there is a facility for comparing 2 independent proportions. What if we had k independent proportions, say proportion of males who are HIV positive in 5 independent regions of a country (20/220, 15/500, n/N, etc)? Secondly and naturally following from my first question, what of post-hoc comparisons for such data given that chi (just like ANOVA for continuous data) will only tell you at least one proportion is different from one other. Can we extrapolate the usual Bonferonni correction to categorical data, if so are there any references to support that? I was once advised to do a series of 2x2 tables for the problem I am describing and adjust alpha downwards just like in Bonferonni but I am not too sure about it.

      A: Considering