Quotes on Statistics, Data Visualization and Science (2024)

  • You can see a lot, just by looking.

    Yogi Berra (#datavisualization,vision)

  • Did you ever see such a thing as a drawing of a muchness?

    Lewis Carroll, Alice in Wonderland (#datavisualization)

  • The critical requirement of an effective graphical display isthat it stimulate spontaneous perceptions of structure indata.

    S. Smith et al.1990 (#datavisualization)

  • Like good writing, producing an effective graphical displayrequires an understanding of purposewhat is to becommunicated, and to whom.

    Michael Friendly, Gallery of Data Visualization, 1991(#data visualization)

  • Have you ever seen voice mail?

    The Hackers Test (#datavisualization,vision)

  • Graphics is the visual means of resolving logical problems.

    Jacques Bertin, Graphics and Graphic Information Processing,2011, p.16. (#datavisualization,vision)

  • The greatest value of a picture is when it forces us to noticewhat we never expected to see.

    John W. Tukey, Exploratory Data Analysis, 1977(#data visualization,pictures,eda)

  • If data analysis is to be well done, much of it must be a matterof judgment, and ‘theory’ whether statistical or non-statistical, willhave to guide, not command.

    John W. Tukey, The Future of Data Analysis, Annals ofMathematical Statistics, Vol. 33 (1), 1962.

  • The physical sciences are used to ‘praying over’ their data,examining the same data from a variety of points of view. This processhas been very rewarding, and has led to many extremely valuableinsights. Without this sort of flexibility, progress in physical sciencewould have been much slower. Flexibility in analysis is often to be hadhonestly at the price of a willingness not to demand that what hasalready been observed shall establish, or prove, what analysis suggests.In physical science generally, the results of praying over the data arethought of as something to be put to further test in another experiment,as indications rather than conclusions.

    John W. Tukey, The Future of Data Analysis, Annals ofMathematical Statistics, Vol. 33 (1), 1962.

  • If one technique of data analysis were to be exalted above allothers for its ability to be revealing to the mind in connection witheach of many different models, there is little doubt which one would bechosen. The simple graph has brought more information to the dataanalyst’s mind than any other device. It specializes in providingindications of unexpected phenomena.

    John W. Tukey, The Future of Data Analysis, The Annals ofMathematical Statistics, Vol. 33, No.1 (Mar., 1962), pp.1-67.(#data visualization)

  • Genius seems to consist merely in trueness of sight.

    Ralph Waldo Emerson, Journals of Ralph Waldo Emerson, Entrydated 1835, May 11 (#datavisualization,vision)

  • The eye obeys exactly the action of the mind.

    Ralph Waldo Emerson, Representative men. English traits.Conduct of life, p.409 (#datavisualization)

  • Vision is the art of seeing things invisible.

    Johnathan Swift, 1711 (#datavisualization,vision)

  • When there is no vision, the people perish.

    Proverbs 29:18 (#datavisualization,vision)

  • If I can’t picture it, I can’t understand it.

    Albert Einstein (#datavisualization,pictures)

  • And those who have insight will shine brightly like thebrightness of the expanse of Heaven.

    Daniel 12:3 (#datavisualization)

  • The one thing that marks the true artist is a clear perceptionand a firm, bold hand, in distinction from that imperfect mental visionand uncertain truth which give up the feeble pictures and the lumpystatues of the mere artisans on canvas or in stone.

    Oliver Wendell Holmes (1860), The Professor at the BreakfastTable Ticknor and Fields, Boston, MA (#datavisualization)

  • I like your motto: One picture is worth 1,000 denials.

    Ronald Reagan to White House News Photographers Assn, 18 May1983 (#data visualization,pictures)

  • With brush you paint the possibilities with pens you scribe theprobabilities for in pictures we find insight while in numbers find westrength.

    Forrest W. Young (#datavisualization,eda,pictures)

  • A graphic should not only show the leaves it should show thebranches as well as the entire tree.

    Jacques Bertin, The Semiology of Graphics, 1983. Translated byW. J. Berg. University of Wisconsin Press : Wisconsin.(#data visualization,eda)

  • Tables are like cobwebs, like the sieve of Danaides; beautifullyreticulated, orderly to look upon, but which will hold no conclusion.Tables are abstractions, and the object a most concrete one, sodifficult to read the essence of.

    Thomas Carlyle, Chartism, 1840, Chapter II, Statistics(#data visualization,tables)

  • A judicious man looks at Statistics, not to get knowledge, but tosave himself from having ignorance foisted on him.

    Thomas Carlyle, Chartism, 1840, Chapter II, Statistics(#data visualization)

  • Although geometrical representations of propositions in thethermodynamics of fluids are in general use and have done good servicein disseminating clear notions in this science, yet they have by nomeans received the extension in respect to variety and generality ofwhich they are capable.

    J. Willard Gibbs, Graphical Methods in the Thermodynamics ofFluids, 1873 (#datavisualization,geometry)

  • Although we often hear that data speak for themselves, theirvoices can be soft and sly.

    Frederick Mosteller, Stephen Fienberg and Robert E. Rourke,Beginning Statistics with Data Analysis 1983, Reading MA, p.234(#data visualization)

  • Nocturne, of Chopin, so beautiful music. But few people willappreciate the music if I just show them the notes. Most of us need tolisten to the music to understand how beautiful it is. But often that’show we present statistics; we just show the notes, we don’t play themusic.

    Hans Rosling, OECD World Forum, Istanbul, June 2007(#data visualization,statistics)

  • The greatest possibilities of visual display lie in vividness andinescapability of the intended message. A visual display can stop yourmental flow in its tracks and make you think. A visual display can forceyou to notice what you never expected to see.

    John W. Tukey (#datavisualization,vision)

  • The purpose of [data] display is comparison (recognition ofphenomena), not numbers … The phenomena are the main actors, numbers arethe supporting cast.

    John W. Tukey (#datavisualization)

  • If an editor should print bad English he would lose his position.Many editors are using and printing bad methods of graphic presentation,but they hold their jobs just the same.

    W. C. Brinton, Graphic methods of presenting facts 1914,p.3. (#data visualization)

  • Around the turn of the century, Karl Pearson, an almost elementalforce for more and better statistical thought in all areas of life,including with gusto, matters of social policy, was thinking andlecturing about graphical methods. But later in Pearson’s life, andcertainly in the careers of R. A. Fisher and the other great statisticalminds of the first half of the century, there was a falling away ofinterest in graphics and an efflorescence of devotion to analyticalmathematical methods. Indeed, for many years there was a contagioussnobbery against so unpopular, vulgar and elementary a topic asgraphics among academic statisticians and their students

    William Kruskal (#datavisualization,statistics)

  • If statistical graphics, although born just yesterday, extendsits reach every day, it is because it replaces long tables of numbersand it allows one not only to embrace at glance the series of phenomena,but also to signal the correspondences or anomalies, to find the causes,to identify the laws.

    Emile Cheysson, c.1877 (#datavisualization,tables)

  • Numbers have an important story to tell. They rely on you to givethem a clear and convincing voice.

    Stephen Few (#datavisualization)

  • The purpose of visualization is insight, not pictures.

    Ben Shneiderman, Extreme visualization: squeezing a billionrecords into a million pixels. In SIGMOD ’08: Proceedings of the 2008ACM SIGMOD international conference on Management of data, pages 3-12,New York, NY, USA, 2008. ACM. (#datavisualization,pictures)

  • I love taxonomies, categories, ways of dividing people intogroups.

    Gretchen Rubin (#datavisualization)

  • Ballerinas are often divided into three categories: jumpers,turners and balancers.

    Robert Gottlieb (#datavisualization)

  • Mr.Funkhouser has made an extremely interesting and valuablecontribution to the history of statistical method. I wish, however, thathe could have added a warning, supported by horrid examples, of theevils of the graphical method unsupported by tables of figures. Both foraccurate understanding, and particularly to facilitate the use of thesame material by other people, it is essential that graphs should not bepublished by themselves, but only when supported by the tables whichlead up to them. It would be an exceedingly good rule to forbid in anyscientific periodical the publication of graphs unsupported bytables.

    John Maynard Keynes, Review of Funkhouser for The EconomicJournal (#data visualization)

  • Without data you are just another person with an opinion.

    W. Edwards Deming (#datavisualization,data)

  • Without a plot you are just a person missing a convincingargument.

    Di Cook, 2016 (#datavisualization)

  • Whatever relates to extent and quantity may be represented bygeometrical figures. Statistical projections which speak to the senseswithout fatiguing the mind, possess the advantage of fixing theattention on a great number of important facts.

    Alexander von Humboldt (#datavisualization,vision)

  • Segnius irritant animos demissa per aures, Quam quae sunt oculussubjecta fidelibus (Roughly: What we hear excites the mind less thanwhat we see).

    Horace (#datavisualization,vision)

  • You see, but you do not observe. The distinction is clear.

    Sherlock Holmes, The Adventures of Sherlock Holmes (1890), “AScandal in Bohemia”, p.162 (#datavisualization,vision)

  • Every picture tells a story.

    Rod Stewart, 1971 (#datavisualization,pictures)

  • A picture is worth a ten thousand words.

    Fred R. Barnard, advertising trade journal Printers Ink, March10 1927. (#datavisualization,pictures)

  • …But it is not always clear which 1000 words.

    John W. Tukey, 1973 (#datavisualization,pictures)

  • Un croquis vaut mieux qu’un long discours. Tr.: A good sketch isbetter than a long speech.

    Napoleon Bonaparte (#datavisualization,pictures)

  • A picture is worth a thousand numbers.

    Anon (#datavisualization,pictures)

  • Show me your flowcharts and conceal your tables, and I shallcontinue to be mystified. Show me your tables, and I won’t usually needyour flowcharts; they’ll be obvious.

    Fred Brooks, The Mythical Man-Month (#datavisualization,pictures,tables)

  • Look here, upon this picture, and on this.

    Shakespeare, Hamlet (#datavisualization,pictures)

  • I am only a picture-taster, the way others are wine- ortea-tasters.

    Bernard Berenson, Sunset and Twilight Harcourt, Brace &World, 1963 (#datavisualization,pictures)

  • Getting information from a table is like extracting sunlight froma cucumber.

    Arthur B. Farquhar & Henry Farquhar, Economic andIndustrial Delusions , 1891. (#datavisualization,pictures)

  • When a law is contained in figures, it is buried like metal in anore; it is necessary to extract it. This is the work of graphicalrepresentation. It points out the coincidences, the relationshipsbetween phenomena, their anomalies, and we have seen what a powerfulmeans of control it puts in the hands of the statistician to verify newdata, discover and correct errors with which they have been stained.

    Emile Cheysson, Les methods de la statistique (1890),34-35. (#data visualization,pictures)

  • Good design is obvious. Great design is transparent.

    Joe Sparano, graphic designer for Oxide Design Co.(#data visualization,design)

  • Content precedes design. Design in the absence of content is notdesign, it’s decoration.

    Jeffrey Zeldman, web designer and entrepreneur(#data visualization,design)

  • Mankind is not a circle with a single center but an ellipse withtwo focal points of which facts are one and ideas the other.

    Victor Hugo (#datavisualization,ellipses,geometry)

  • So, Fabricius, I already have this: that the most true path ofthe planet [Mars] is an ellipse, which Durer also calls an oval, orcertainly so close to an ellipse that the difference is insensible.

    Johannes Kepler, 1605 (#datavisualization,ellipses)

  • Programming graphics in X is like finding the square root of piusing Roman numerals.

    Henry Spencer (#computing)

  • The purpose of computing is insight, not numbers.

    Richard Hamming, Introduction To Applied NumericalAnalysis (#computing)

  • … to be a good theoretical statistician one must also compute,and must therefore have the best computing aids.

    Frank Yates, Sampling Methods for Censuses and Surveys1949 (#computing)

  • We [he and Halmos] share a philosophy about linear algebra: wethink basis-free, we write basis-free, but when the chips are down weclose the office door and compute with matrices like fury.

    Irving Kaplansky, Paul Halmos: Celebrating 50 Years ofMathematics (#computing)

  • Seek computer programs that allow you to do the thinking.

    George E. P. Box (#computing)

  • If you only know how to use a hammer, every problem starts tolook like a nail. Stay away from that trap.

    Richard B. Johnson(#computing)

  • [It is] best to confuse only one issue at a time.

    Kernihan & Ritchie(#computing)

  • The nice thing about standards is that there are so many of themto choose from.

    Andrew Tanenbaum, Computer Networks(#computing)

  • There are no routine statistical questions, only questionablestatistical routines.

    David R. Cox(#computing,statistics)

  • Be careful the environment you choose for it will shape you becareful the friends you choose for you will become like them.

    W. Clement Stone (#computing,tidydata)

  • Be careless in your dress if you must, but keep a tidy soul.

    Mark Twain (#computing,tidydata)

  • I’m a tidy sort of bloke. I don’t like chaos. I kept records inthe record rack, tea in the tea caddy, and pot in the pot box.

    George Harrison (#computing,tidydata)

  • Thou shalt not sit with statisticians nor commit a SocialScience.

    W.H. Auden (#statistics)

  • There are two kinds of statistics, the kind you look up and thekind you make up.

    Rex Stout (#statistics)

  • Statistics are like alienists – they will testify for eitherside.

    Fiorello H. La Guardia(#statistics)

  • You may prove anything by figures.

    Thomas Carlyle (#statistics)

  • To understand God’s thoughts we must study statistics, for theseare the measure of His purpose.

    Florence Nightingale(#statistics)

  • You cannot feed the hungry on statistics.

    David Lloyd George(#statistics)

  • A single death is a tragedy, a million deaths is a statistic.

    Kurt Tucholsky, mis-attributed to Joseph Stalin, FranzosischerWitz, 1925 (#statistics)

  • Statistics are like a bikini. What they reveal is suggestive, butwhat they conceal is vital.

    Aaron Levenstein(#statistics)

  • Do not put faith in what statistics say until you have carefullyconsidered what they do not say.

    William W. Watt (#statistics)

  • Facts are stubborn things, but statistics are more pliable.

    Mark Twain (#statistics)

  • Statistics are figures used as arguments.

    Leonard L. Levison(#statistics)

  • Figures won’t lie, but liars will figure.

    Unknown (though often misattributed to Mark Twain)(#statistics)

  • I always find that statistics are hard to swallow and impossibleto digest. The only one I can remember is that if all the people who goto sleep in church were laid end to end they would be a lot morecomfortable.

    Mrs Robert A. Taft(#statistics)

  • Statistician: Delphic figure who lacks the necessary vocabularyto converse with mere mortals.

    Rod Nicolson, Psychology Software News(#statistics)

  • Get the facts first, and then you can distort them as much as youplease.

    Mark Twain (#statistics)

  • If you want to inspire confidence, give plenty of statistics. Itdoes not matter that they should be accurate, or even intelligible, aslong as there is enough of them.

    Lewis Carroll (#statistics)

  • It is a truth very certain that when it is not in our power todetermine what is true we ought to follow what is most probable.

    Rene Descartes (#statistics)

  • Models are to be used, but not to be believed.

    Henry Theill (#statistics)

  • The deepest sin of the human mind is to believe things withoutevidence.

    Thomas H. Huxley(#statistics)

  • Man must learn to simplify, but not to the point offalsification.

    Aldous Huxley (#statistics)

  • Since small differences in probability cannot be appreciated bythe human mind, there seems little point in being excessively preciseabout uncertainty.

    George E. P. Box & G. C. Tiao, Bayesian inference instatistical analysis, 1973. Addison-Wesley, Reading, MA, p.65.(#statistics)

  • Some people hate the very name of statistics but I find them fullof beauty and interest. Whenever they are not brutalized, but delicatelyhandled by the higher methods, and are warily interpreted, their powerof dealing with complicated phenomena is extraordinary.

    Francis Galton, Natual Inheritance 1889 p.62(#statistics)

  • [Statistics are] the only tools by which an opening may be cutthrough the formidable thicket of difficulties that bars the path ofthose who pursue the Science of Man.

    Francis Galton, Natural Inheritance, 1889.(#statistics)

  • Data analysis is an aid to thinking and not a replacementfor.

    Richard Shillington(#statistics)

  • Sometimes the only thing you can do with a poorly designedexperiment is to try to find out what it died of.

    Ronald A. Fisher (#statistics,experimentaldesign)

  • The best time to plan an experiment is after you’ve done it.

    Ronald A. Fisher (#statistics,experimentaldesign)

  • [The War Office kept three sets of figures:] one to mislead thepublic, another to mislead the Cabinet, and the third to misleaditself.

    Herbert Asquith, Alistair Horne, Price of Glory(#statistics)

  • Why are you testing your data for normality? For large samplesizes the normality tests often give a meaningful answer to ameaningless question (for small samples they give a meaningless answerto a meaningful question)

    Greg Snow, R-Help, 21 Feb 2014(#statistics,normality,nhst)

  • The relevant question is not whether ANOVA assumptions are metexactly, but rather whether the plausible violations of the assumptionshave serious consequences on the validity of probability statementsbased on the standard assumptions

    Gene V. Glass & Percy D. Peckham & James R. Sanders,Consequences of Failure to Meet Assumptions Underlying the Fixed EffectsAnalyses of Variance and Covariance, Review of Educational Research Vol.42, No.3 (Summer, 1972), pp.237-288 , p.237.(#statistics,anova)

  • Exploratory data analysis can never be the whole story, butnothing else can serve as the foundation stone – as the first step.

    John W. Tukey, Exploratory Data Analysis, 1977, p.3.(#statistics,data,data analysis)

  • The best thing about being a statistician is that you get to playin everyone’s backyard.

    John W. Tukey (#statistics)

  • Far better an approximate answer to the right question, which isoften vague, than an exact answer to the wrong question, which canalways be made precise.

    John W. Tukey, The Future of Data Analysis, The Annals ofMathematical Statistics, Vol. 33, No.1 (Mar., 1962), pp.1-67.(#statistics)

  • The worst, i.e., most dangerous, feature of ‘accepting the nullhypothesis’ is the giving up of explicit uncertainty … Mathematics cansometimes be put in such black-and-white terms, but our knowledge orbelief about the external world never can.

    John W. Tukey, The Philosophy of Multiple Comparisons, Statist.Sci. 6 (1) 100 - 116, February, 1991.(#statistics,nhst)

  • Better to have an approximate answer to the right question than aprecise answer to the wrong question.

    John W. Tukey, Quoted by John Chambers(#statistics)

  • All models are wrong, but some are useful.

    George E. P. Box(#statistics)

  • Every model is an approximation.

    George E. P. Box(#statistics)

  • The business of the statistician is to catalyze the scientificlearning process.

    George E. P. Box(#statistics)

  • Statisticians, like artists, have the bad habit of falling inlove with their models.

    George E. P. Box(#statistics)

  • If there were a probability of only p = 0.04 of finding a crockof gold behind the next tree, wouldn’t you go and look?

    George E. P. Box(#statistics)

  • When the ratio of the largest to smallest observation is largeyou should question whether the data are being analyzed in the rightmetric (transformation).

    George E. P. Box(#statistics)

  • A useful type of time series model is a recipe for transformingserial data into white noise.

    George E. P. Box (#statistics,timeseries)

  • It is the data that are real (they actually happened!) The modelis a hypothetical conjecture that might or might not summarize and/orexplain important features of the data

    George E. P. Box(#statistics)

  • It is not unusual for a well-designed experiment to analyzeitself.

    George E. P. Box(#statistics)

  • Discovering the unexpected is more important than confirming theknown.

    George E. P. Box(#statistics)

  • We are drowning in information and starving for knowledge.

    John Naisbitt, Megatrends(#statistics)

  • `Data! data!’ he cried impatiently. I can’t make bricks withoutclay.

    Arthur Conan-Doyle, Adventures of Sherlock Holmes “The CopperBeeches” (#data)

  • I have no data yet. It is a capital mistake to theorize beforeone has data.

    Arthur Conan-Doyle, Adventures of Sherlock Holmes “A Scandal inBohemia” (#data)

  • This was an unexpected piece of luck. My data were coming morequickly than I could have reasonably hoped.

    Arthur Conan-Doyle, Memoirs of Sherlock Holmes, The MusgraveRitual (#data)

  • I have not all my facts yet, but I do not think there are anyinsuperable difficulties. Still, it is an error to argue in front ofyour data. You find yourself insensibly twisting them round to fit yourtheories.

    Arthur Conan-Doyle, His Last Bow, Wisteria Lodge(#data)

  • The only thing we know for sure about a missing data point isthat it is not there, and there is nothing that the magic of statisticscan do do change that. The best that can be managed is to estimate theextent to which missing data have influenced the inferences we wish todraw.

    Howard Wainer (#data)

  • Big data can change the way social science is performed, but willnot replace statistical common sense.

    Thomas Landsall-Welfare, Nowcasting the mood of the nation,Significance v. 9(4), August 12, 2012, p.28.(#data)

  • Baseball is ninety percent mental and the other half isphysical.

    Yogi Berra (#data)

  • Whenever I see an outlier, I never know whether to throw it awayor patent it.

    Bert Gunter, R-Help, 9/14/2015(#data,outliers)

  • In God we trust. All others must bring data.

    W. Edwards Deming (#data)

  • The combination of some data and an aching desire for an answerdoes not ensure that a reasonable answer can be extracted from a givenbody of data.

    John W. Tukey, Sunset Salvo, The American Statistician Vol. 40(1), 1986. (#data)

  • How do I love thee? Let me count the ways.

    Elizabeth Barrett Browning, Sonnets from the Portuguese(#data,counts)

  • Not everything that counts can be counted, and not everythingthat can be counted counts.

    William Bruce Cameron(#data,counts)

  • Whenever you can, count.

    Francis Galton (#data,counts)

  • It is difficult to understand why statisticians commonly limittheir inquiries to Averages, and do not revel in more comprehensiveviews. Their souls seem as dull to the charm of variety as that of thenative of our flat English counties, whose retrospect of Switzerland wasthat, if its mountains could be thrown into its lakes, two nuisanceswould be got rid of at once.

    Francis Galton, Natural Inheritance(#data,averages)

  • While the individual man is an insoluble puzzle, in the aggregatehe becomes a mathematical certainty. You can, for example, neverforetell what any one man will do, but you can say with precision whatan average number will be up to. Individuals vary, but percentagesremain constant. So says the statistician.

    Arthur Conan-Doyle, Sign of the Four(#data,averages)

  • If you put a buttock on a hot plate and another one on an icecube, the average is good, but in reality your bottom is in trouble.

    Grigore Moisil(#data,averages)

  • The graphical method has considerable superiority for theexposition of statistical facts over the tabular. A heavy bank offigures is grievously wearisome to the eye, and the popular mind is asincapable of drawing any useful lessons from it as of extractingsunbeams from cucumbers.

    Arthur B. Farquhar & Henry Farquhar, Economic andIndustrial Delusions, 1891.(#data,tables,vision)

  • Let it serve for table-talk.

    William Shakespeare, The Merchant of Venice, Act III, Sc.5. (#data,tables)

  • I drink to the general joy o’ the whole table.

    William Shakespeare, Macbeth, Act III, Sc. 4.(#data,tables)

  • Isolated facts, those that can only be obtained by rough estimateand that require development, can only be presented in memoires; butthose that can be presented in a body, with details, and on whoseaccuracy one can rely, may be expounded in tables.

    E. Duvillard, Memoire sur le travail du Bureau de statistique1806. (#data,tables)

  • Study without reflection is a waste of time reflection withoutstudy is dangerous

    Confuscius, Analects (551-479 BC)(#science)

  • Things should be made as simple as possible, but not anysimpler

    Albert Einstein (#science)

  • So much has already been written about everything that you can’tfind out anything about it.

    James Thurber, 1961(#science)

  • The practical power of a statistical test is the product of its’statistical power and the probability of use.

    John W. Tukey, A Quick, Compact, Two Sample Test to Duckworth’sSpecifications (#science,power)

  • Theory into Practice.

    Mao Tse-Tung, The Little Red Book(#science)

  • Beauty is truth; truth, beauty. That is all ye know on Earth, andall ye need to know.

    John Keats, Ode on a Grecian urn(#science)

  • They consider me to have sharp and penetrating vision because Isee them through the mesh of a sieve.

    Kahlil Gibran(#science,vision)

  • The journalistic vision sharpens to the point of maximum impactevery event, every individual and social configuration; but the honingis uniform.

    George Steiner(#science,vision)

  • Some people weave burlap into the fabric of our lives, and someweave gold thread. Both contribute to make the whole picture beautifuland unique.

    Anon. (#science,pictures)

  • Time extracts various values from a painter’s work. When thesevalues are exhausted the pictures are forgotten, and the more a picturehas to give, the greater it is.

    Henri Matisse(#science,pictures)

  • God is in the details.

    Mies van der Roche, New York Times August 19, 1969(#science)

  • The devil is in the details.

    George Schultz (#science)

  • One has to be able to count if only so that at fifty one doesn’tmarry a girl of twenty.

    Maxim Gorky, The Zykovs 1914(#science,counts)

  • A man has one hundred dollars and you leave him with two dollars,that’s subtraction.

    Mae West, My Little Chickadee 1940(#science)

  • In the fields of observation chance favors only the preparedmind.

    Louis Pasteur (#science,data)

  • The eye of a human being is a microscope, which makes the worldseem bigger than it really is.

    Kahlil Gibran, A Handful of Sand on the Shore(#science,vision)

  • To the man who only has a hammer in the toolkit, every problemlooks like a nail.

    Abraham Maslow (#science)

  • Four hostile newspapers are more to be feared than a thousandbayonets.

    Napoleon Bonaparte, Maxims(#science)

  • When I’m working on a problem, I never think about beauty. Ithink only how to solve the problem. But when I have finished, if thesolution is not beautiful, I know it is wrong.

    Richard Buckminster Fuller(#science)

  • He who asks a question is a fool for five minutes he who does notask a question remains a fool forever.

    Chinese Proverb (#science)

  • The great tragedy of science – the slaying of a beautifulhypothesis by an ugly fact.

    Thomas Huxley (#science,nhst)

  • Give a man to fish and he will eat for a day. Teach a man to fishand he will eat for the rest of his life.

    Chinese Proverb (#science)

  • Give a man a fish and he will eat for a day. Teach a man to fishand you lose a consulting job forever.

    Howard Wainer, 2016(#science)

  • When you have eliminated the impossible, whatever remains,however improbable, must be the truth.

    Arthur Conan Doyle, The Sign of the Four (1890), Ch. 6(#science,probability)

  • If you choose to represent the various parts in life by holesupon a table, of different shapes—some circular, some triangular, somesquare, some oblong—we shall generally find that the triangular personhas got into the square hole, the oblong into the triangular, and asquare person has squeezed himself into the round hole.

    Sydney Smith, 1769-1845(#science,geometry)

  • I know of scarcely anything so apt to impress the imagination asthe wonderful form of cosmic order expressed by the “Law of Frequency ofError.” The law would have been personified by the Greeks and deified,if they had known of it. It reigns with serenity and in completeself-effacement, amidst the wildest confusion. The huger the mob, andthe greater the apparent anarchy, the more perfect is its sway. It isthe supreme law of Unreason. Whenever a large sample of chaotic elementsare taken in hand and marshaled in the order of their magnitude, anunsuspected and most beautiful form of regularity proves to have beenlatent all along.

    Sir Francis Galton, Natural Inheritance London: Macmillan,1889. Quoted in J. R. Newman (ed.) The World of Mathematics, New York:Simon and Schuster, 1956. p.1482.(#science,normality)

  • In scientific thought we adopt the simplest theory which willexplain all the facts under consideration and enable us to predict newfacts of the same kind. The catch in this criterion lies in the world“simplest.” It is really an aesthetic canon such as we find implicit inour criticisms of poetry or painting. The layman finds such a law asdx/dt = K(d2x/dy2) much less simple than “it oozes,” of whichit is the mathematical statement. The physicist reverses this judgment,and his statement is certainly the more fruitful of the two, so far asprediction is concerned. It is, however, a statement about somethingvery unfamiliar to the plainman, namely, the rate of change of a rate ofchange.

    John Burdon Sanderson Haldane, Possible Worlds, 1927.(#science)

  • Oh, what a tangled web we weave, When first we practice todeceive!

    Sir Walter Scott (#science)

  • Practice is the best of all instructors.

    Publilius Syrus (#science)

  • We should go to the masses and learn from them, synthesize theirexperience into better, articulated principles and methods, then dopropaganda among the masses, and call upon them to put these principlesand methods into practice so as to solve their problems and help themachieve liberation and happiness.

    Chairman Mao Zedong, “Get Organized!” (November 29, 1943),Selected Works, Vol. III, p.158.(#science)

  • An elementary demonstration is one that requires no knowledge—just an infinite amount of intelligence.

    Richard Feynman (#science)

  • Science may be described as the art of systematicover-simplification.

    Karl Popper (#science)

  • Science is like sex: sometimes something useful comes out, butthat is not the reason we are doing it.

    Richard Feynman (#science)

  • Humanists believe that the world has a fixed number of mysteries,so that when one is solved, our sense of wonder is diminished.Scientists believe that the world has endless mysteries, so that whenone is solved, there are always new ones to ponder.

    D. O. Hebb, quoted by Steven Pinker(#science)

  • Art and science encounter each other when they seekexactitude.

    Etienne-Jules Marey(#science)

  • Circ*mstantial evidence is a very tricky thing. It may seem topoint very straight to one thing, but if you shift your own point ofview a little, you may find it pointing in an equally uncompromisingmanner to something entirely different.

    Sherlock Holmes, The Adventures of Sherlock Holmes (1892) “TheBoscombe Valley Mystery” (#science)

  • To find out what happens when you change something, it isnecessary to change it.

    Box, Hunter, and Hunter, Statistics for Experimenters(1978) (#science)

  • He who loves practice without theory is like the sailor whoboards ship without a rudder and compass and never knows where he may becast.

    Leonardo da Vinci (#science)

  • Science is built up of facts, as a house is with stones. But acollection of facts is no more a science than a heap of stones is ahouse.

    Henri Poincare (#science)

  • The museum spreads its surfaces everywhere, and becomes anuntitled collection of generalizations that mobilize the eye.

    Robert Smithson(#science,generalizations)

  • In one word, to draw the rule from experience, one mustgeneralize; this is a necessity that imposes itself on the mostcirc*mspect observer.

    Henri Poincare, The Value of Science: Essential Writings ofHenri Poincare(#science,generalizations)

  • If I have seen further, it is by standing on the shoulders ofgiants.

    Sir Isaac Newton, Letter to Robert Hooke, Feb.5, 1676(#science,generalizations)

  • Few intellectual pleasures are more keen than those enjoyed by aperson who, while he is occupied in some special inquiry, suddenlyperceives that it admits of a wide generalization, and that his resultshold good in previously-unsuspected directions.

    Francis Galton, North American Review, 150 419-431 (1890)(#science,generalizations)

  • The elegance of a theorem is directly proportional to the numberof ideas you can see in it and inversely proportional to the effort ittakes to see them.

    George Polya(#science,generalizations)

  • A mathematician, like a painter or a poet, is a master ofpattern. The mathematician’s patterns, like the painter’s or the poet’s,must be beautiful; the ideas, like the colors or the words, must fittogether in a harmonious way. … There is no permanent place in the worldfor ugly mathematics.

    G. H. Hardy(#science,generalizations)

  • An idea which can be used once is a trick. If it can be used morethan once it becomes a method.

    George Polya and Gabor Szego(#science,generalizations)

  • When the time is ripe for certain things, these things appear indifferent places in the manner of violets coming to light in earlyspring.

    Farkas Bolyai, To his son Janos Bolyai, urging him to claim theinvention of non-Euclidean geometry without delay, quoted in Ming Li andPaul Vitanyi, An introduction to Kolmogorov Complexity and ItsApplications, 1st ed., 1993, p.83.(#science,generalizations)

  • The only new thing in the world is the history you don’tknow.

    Harry S. Truman, Quoted by David McCulloch(#history)

  • So we beat on, boats against the current, borne back ceaselesslyinto the past.

    F. Scott Fitzgerald, The Great Gatsby (1925)(#history,time)

  • Euclid alone has looked on beauty bare.

    Edna St Vincent Millay(#history,geometry)

  • The past only exists insofar as it is present in the records oftoday. And what those records are is determined by what questions weask. There is no other history than that.

    Wheeler, 1982:24 (#history)

  • A generation which ignores history has no past and no future

    Robert Heinlein (#history)

  • For my part, I consider that it will be found much better by allparties to leave the past to history, especially as I propose to writethat history myself.

    Winston Churchill (#history)

  • If you would understand anything, observe its beginning and itsdevelopment.

    Aristotle (#history)

  • God alone knows the future, but only an historian can alter thepast.

    Ambrose Bierce (#history)

  • Since God himself cannot change the past, he is obliged totolerate the existence of historians.

    Attributed to Samuel Butler(#history)

  • At the heart of good history is a naughty little secret: goodstorytelling.

    Stephen Schiff (#history)

  • It has been said that though God cannot alter the past,historians can; it is perhaps because they can be useful to Him in thisrespect that He tolerates their existence.

    Samuel Butler, Erewhon Revisted(#history)

  • History is moving statistics and statistics is frozenhistory.

    A. L. Schlozer, Theorie der Statistik 1804, p.86(#history)

  • Time flies like an arrow fruit flies like a banana.

    Anthony G. Oettinger, Often mis-attributed to Groucho Marx(#history,time)

  • Time is the longest distance between two places.

    Tennessee Williams, The Glass Menagerie(#history,time)

  • Those who make the worst use of their time are the first tocomplain of its brevity.

    Jean de La Bruyere, Les Caracteres(#history,time)

  • The past is a foreign country: they do things differentlythere.

    L. P. Hartley, The Go-Between(#history,time)

  • I never think of the future - it comes soon enough.

    Albert Einstein(#history,time)

  • The best way to predict the future is to invent it

    Alan Kay (#history,time)

  • The future ain’t what it used to be

    Yogi Berra (#history,time)

  • Look not mournfully into the past. It comes not back again.Wisely improve the present. It is thine. Go forth to meet the shadowyfuture, without fear.

    Henry Wadsworth Longfellow(#history,time)

  • Let him who would enjoy a good future waste none of hispresent.

    Roger Babson (#history,time)

  • When in doubt, predict that the present trend will continue.

    Merkins Maxim (#history,time)

  • The only use of a knowledge of the past is to equip us for thepresent. The present contains all that there is. It is holy ground; forit is the past, and it is the future.

    Alfred North Whitehead(#history,time)

  • My past is my wisdom to use today. … my future is my wisdom yetto experience. Be in the present because that is where life resides.

    Gene Oliver, Life and the Artistry of Change(#history,time)

  • I have realized that the past and future are real illusions, thatthey exist in the present, which is what there is and all there is.

    Alan Watts (#history,time)

  • The future is uncertain but the end is always near.

    Jim Morrison (#history,time)

  • Time has no divisions to mark its passage, there is never athunder-storm of blare of trumpets to announce the beginning of a newmonth or year. Even when a new century begins, it is only we mortals whoring bells and fire off pistols.

    Thomas Mann, The Magic Mountain (1924)(#history,time)

  • Direction is more important than speed. We are so busy looking atour speedometers that we forget the milestone.

    Anonymous(#history,milestones)

  • Only sixteen players have hit fifty or more homers in a season.To me, that’s a very special milestone.

    Mark McGwire(#history,milestones)

  • As life runs on, the road grows strange with faces new – and nearthe end. The milestones into headstones change, Neath every one afriend.

    James Russell Lowell(#history,milestones)

  • This paper contains much that is new and much that is true.Unfortunately, that which is true is not new and that which is new isnot true.

    attributed to Wolfgang Pauli(#reviews)

  • This book fills a much-needed gap.

    Attributed to Moses Hadas(#reviews)

  • Russell left the vast darkness of the topic unobscured

    Alfred North Whitehead, Referring to Bertrand Russell(#reviews)

  • Mathematicians have always been rather of a jealous nature, andundoubtedly jealousy was a family characteristic of the Bernoullis.There is some excuse for mathematicians, for their reputation stands forposterity largely not on what they did, but on what their contemporariesattributed to them.

    Karl Pearson, The History of Statistics in the 17th and 18thCenturies. (#history)

  • When choosing between two evils, I always like to take the oneI’ve never tried before.

    Mae West, 1941 (#ethics)

  • To err is human—but it feels divine!

    Mae West (paraphrase of Alexander Pope, “To err is human, toforgive devine”) (#data,averages)

  • Good judgment comes from experience experience comes from badjudgment.

    Fred Brooks (#history)

  • One must learn by doing the thing; for though you think you knowit, you have no certainty until you try.

    Sophocles

  • There is a magic in graphs. The profile of a curve reveals in aflash a whole situation—the life history of an epidemic, a panic, or anera of prosperity. The curve informs the mind, awakens the imagination,convinces.

    Henry D. Hubbard, Foreword to Brinton (1939), GraphicPresentation (#datavisualization,pictures)

  • Graphs carry the message home. A universal language, graphsconvey information directly to the mind. Without complexity there isimaged to the eye a magnitude to be remembered. Words have wings, butgraphs interpret. Graphs are pure quantity, stripped of verbal sham,reduced to dimension, vivid, unescapable.

    Henry D. Hubbard, Foreword to Brinton (1939), GraphicPresentation (#datavisualization,pictures)

  • Graphs are all inclusive. No fact is too slight or too great toplot to a scale suited to the eye. Graphs may record the path of an ionor the orbit of the sun, the rise of a civilization, or the accelerationof a bullet, the climate of a century or the varying pressure of a heartbeat, the growth of a business, or the nerve reactions of a child.

    Henry D. Hubbard, Foreword to Brinton (1939), GraphicPresentation (#datavisualization,pictures)

  • The graphic art depicts magnitudes to the eye. It does more. Itcompels the seeing of relations. We may portray by simple graphicmethods whole masses of intricate routine, the organization of anenterprise, or the plan of a campaign. Graphs serve as storm signals forthe manager, statesman, engineer; as potent narratives for the actuary,statist, naturalist; and as forceful engines of research for science,technology and industry. They display results. They disclose new factsand laws. They reveal discoveries as the bud unfolds the flower.

    Henry D. Hubbard, Foreword to Brinton (1939), GraphicPresentation (#datavisualization,pictures,vision)

  • The graphic language is modern. We are learning its alphabet.That it will develop a lexicon and a literature marvelous for itsvividness and the variety of application is inevitable. Graphs aredynamic, dramatic. They may epitomize an epoch, each dot a fact, eachslope an event, each curve a history. Wherever there are data to record,inferences to draw, or facts to tell, graphs furnish the unrivalledmeans whose power we are just beginning to realize and to apply.

    Henry D. Hubbard, Foreword to Brinton (1939), GraphicPresentation (#datavisualization,pictures)

  • In One Dimensions, did not a moving Point produce a Line with twoterminal points? In two Dimensions, did not a moving Line produce aSquare wit four terminal points? In Three Dimensions, did not a movingSquare produce - did not the eyes of mine behold it - that blessedbeing, a Cube, with eight terminal points? And in Four Dimensions, shallnot a moving Cube - alas, for Analogy, and alas for the Progress ofTruth if it be not so - shall not, I say the motion of a divine Cuberesult in a still more divine organization with sixteen terminalpoints?

    Edwin A. Abbott, Flatland: A Romance of Many Dimensions(#data visualization,geometry)

  • To comport oneself with perfect propriety in Polygonal society,one ought to be a Polygon oneself.

    Edwin A. Abbott, Flatland: A Romance of Many Dimensions(#data visualization,geometry)

  • True, said the Sphere; it appears to you a Plane, because you arenot accustomed to light and shade and perspective; just as in Flatland aHexagon would appear a Straight Line to one who has not the Art of SightRecognition. But in reality it is a Solid, as you shall learn by thesense of Feeling.

    Edwin A. Abbott, Flatland: A Romance of Many Dimensions(#data visualization,geometry)

  • There are twenty one mystical dimensions of consciousness.Enlightenment is abiding in the highest three dimensions ofconsciousness.

    Amit Ray, Enlightenment Step by Step (#datavisualization,geometry)

  • I would say time is definitely one of my top three favoritedimensions.

    Randall Munroe, xkcd(#history,time)

  • Poetry is when an emotion has found its thought and the thoughthas found its words.

    Robert Frost (#science)

  • I don’t like to commit myself about heaven and hell - you see, Ihave friends in both places.

    Mark Twain

  • Errors using inadequate data are much less than those using nodata at all.

    Charles Babbage, Circa 1850(#data)

  • From carefully compiled statistical facts more may be learned[about] the moral nature of Man than can be gathered from all theaccumulated experiences of the preceding ages.

    Henry Thomas Buckle, A History of Civilization in England,1857/1898, p.17 (#statistics)

  • Statistical thinking will one day be as necessary for efficientcitizenship as the ability to read and write.

    H.G. Wells, Mankind in Making, 1903(#statistics)

  • Statistical accounts are to be referred to as a dictionary by menof riper years, and by young men as a grammar, to teach them therelations and proportions of different statistical subjects, and toimprint them on the mind at a time when the memory is capable of beingimpressed in a lasting and durable manner, thereby laying the foundationfor accurate and valuable knowledge.

    William Playfair, The Statistical Breviary (1801), 5-6.(#statistics)

  • Geography is only a branch of statistics, a knowledge of which isnecessary to the well-understanding of the history of nations, as wellas their situations relative to each other.

    William Playfair, The Commercial and Political Atlas,p.29 (#statistics)

  • No study is less alluring or more dry and tedious thanstatistics, unless the mind and imagination are set to work, or that theperson studying is particularly interested in the subject; which lastcan seldom be the case with young men in any rank of life.

    William Playfair, The Statistical Breviary (1801), p.16(#statistics)

  • DIAGRAMS are of great utility for illustrating certain questionsof vital statistics by conveying ideas on the subject through the eye,which cannot be so readily grasped when contained in figures.

    Florence Nightingale, Mortality of the British Army, 1857(#data visualization,pictures,vision)

  • To give insight to statistical information it occurred to me,that making an appeal to the eye when proportion and magnitude areconcerned, is the best and readiest method of conveying a distinctidea.

    William Playfair, The Statistical Breviary (1801), p.2(#data visualization,pictures,vision)

  • Regarding numbers and proportions, the best way to catch theimagination is to speak to the eyes.

    William Playfair, Elemens de statistique, Paris, 1802,p.XX. (#datavisualization,pictures,vision)

  • The aim of my carte figurative is to convey promptly to the eyethe relation not given quickly by numbers requiring mentalcalculation.

    Charles Joseph Minard (#datavisualization,pictures)

  • Information that is imperfectly acquired, is generally asimperfectly retained; and a man who has carefully investigated a printedtable, finds, when done, that he has only a very faint and partial ideaof what he has read; and that like a figure imprinted on sand, is soontotally erased and defaced.

    William Playfair, The Commercial and Political Atlas (p.3),1786 (#datavisualization,pictures,tables)

  • Since the aim of exploratory data analysis is to learn what seemsto be, it should be no surprise that pictures play a vital role in doingit well.

    John W. Tukey, John W. Tukey’s Works on Interactive Graphics.The Annals of Statistics Vol. 30, No.6 (Dec., 2002), pp.1629-1639(#data visualization,pictures,eda)

  • There is nothing better than a picture for making you think ofquestions you had forgotten to ask (even mentally).

    John W. Tukey & Paul Tukey, John W. Tukey’s Works onInteractive Graphics. The Annals of Statistics Vol. 30, No.6 (Dec.,2002), pp.1629-1639 (#datavisualization,pictures)

  • Functional visualizations are more than innovative statisticalanalyses and computational algorithms. They must make sense to the userand require a visual language system that uses colour, shape, line,hierarchy and composition to communicate clearly and appropriately, muchlike the alphabetic and character-based languages used worldwide betweenhumans.

    Matt Woolman, Digital Information Graphics (#datavisualization,pictures)

  • A people without the knowledge of their past history, origin andculture is like a tree without roots

    Marcus Garvey (#history)

  • The bushels of rings taken from the fingers of the slain at thebattle of Cannae, above two thousand years ago, are recorded; … but thebushels of corn produced in England at this day, or the number of theinhabitants of the country, are unknown, at the very time that we aredebating that most important question, whether or not there issufficient substance for those who live in the kingdom.

    William Playfair, The Statistical Breviary (1801), p.7-8(#history)

  • How can the past and future be, when the past no longer is, andthe future is not yet? As for the present, if it were always present andnever moved on to become the past, it would not be time, buteternity.

    St.Augustine of Hippo, Confessions(#history,time)

  • Every measurable thing, except numbers, is imagined in the mannerof continuous quantity. Therefore, for the mensuration of such a thing,it is necessary that points, lines and surfaces, or their properties beimagined. For in them, as the Philosopher has it, measure or ratio isinitially found, while in other things it is recognized by similarity asthey are being referred to by the intellect to the geometricalentities.

    Nicole Oresme, The Latitude of Forms (#datavisualization,geometry)

  • Any one who considers arithmetical methods of producing randomdigits is, of course, in a state of sin.

    John von Neumann, Various techniques used in connection withrandom digits, Applied Mathematics Series, 1951, no 12, 36-38.(#computing,random numbers)

  • With four parameters I can fit an elephant, and with five I canmake him wiggle his trunk.

    John von Neumann, Quoted by Freeman Dyson(#science)

  • The punishment of every disordered mind is its own disorder.

    St.Augustine of Hippo, Confessions(#science)

  • It is strange that only extraordinary men make the discoveries,which later appear so easy and simple.

    Georg C. Lichtenberg(#science)

  • Two things are infinite: the universe and human stupidity; andI’m not sure about the universe.

    Albert Einstein (#science)

  • Facts, however numerous, do not constitute a science. Likeinnumerable grains of sand on the sea shore, single facts appearisolated, useless, shapeless; it is only when compared, when arranged intheir natural relations, when crystallised by the intellect, that theyconstitute the eternal truths of science

    William Farr, “Observation,” Br. Ann. Med. 1 (1837): 693(#science)

  • Tidy datasets are all alike, but every messy dataset is messy inits own way

    Hadley Wickham (#computing,tidydata)

  • We cannot understand the world without numbers, and we cannotunderstand it with numbers alone.

    Hans Rosling (#data)

  • Bad data makes bad models. Bad models instruct people to makeineffective or harmful interventions. Those bad interventions producemore bad data, which is fed into more bad models.

    Cory Doctorow, Machine Learning’s Crumbling Foundations, Aug2021. (#data)

  • Whenever I am infuriated, I revenge myself with a new Diagram

    Florence Nightingale, Letter 1857.8.19 to Sidney Herbert(#data visualization,pictures)

  • Again I must repeat my objections to intermingling causation withstatistics. It might be to a certain extent admissible if you had nosanitary head. But you have one, & his report should be quiteseparate. The statistician has nothing to do with causation: he isalmost certain in the present state of knowledge to err.

    Florence Nightingale, Letter, March 1861(#statistics)

  • A Bayesian is one who, vaguely expecting a horse, and catching aglimpse of a donkey, strongly believes he has seen a mule.

    Stephen John Senn(#statistics)

  • What nature hath joined together, multiple regression cannot putasunder.

    Richard Nisbett (#statistics)

  • Stepwise regression is probably the most abused computerizedstatistical technique ever devised. If you think you need stepwiseregression to solve a particular problem you have, it is almost certainthat you do not. Professional statisticians rarely use automatedstepwise regression.

    Leland Wilkinson, SYSTAT (1984). P. 196.(#computing)

  • It would help if the standard statistical programs did notgenerate t statistics in such profusion. The programs might be writtento ask, “Do you really have a probability sample?”, “By what standardwould you judge a fitted coefficient large or small?” Or perhaps theycould merely say, printed in bold capitals beside each equation, “SoWhat Else Is New?”

    Donald M. McCloskey, The Loss Function Has Been Mislaid: TheRhetoric of Significance Tests, American Economic Review, Vol 75,#2. (#computing)

  • The documentation level of R is already much higher than averagefor open source software and even than some commercial packages(esp.SPSS is notorious for its attitude of “You want to do one of thesethings. If you don’t understand what the output means, click help andwe’ll pop up five lines of mumbo-jumbo that you’re not going tounderstand either.”)

    Peter Dalgaard, R-help mailing list 4.2.2002(#computing)

  • S has forever altered the way people analyze, visualize, andmanipulate data …. S is an elegant, widely accepted, and enduringsoftware system, with conceptual integrity, thanks to the insight,taste, and effort of John Chambers.

    Association for Computing Machinery Software System Award(#computing)

  • Tradition among experienced S programmers has always been thatloops (typically ‘for’ loops) are intrinsically inefficient: expressingcomputations without loops has provided a measure of entry into theinner circle of S programming.

    John Chambers, Programming With Data, p.173.(#computing)

  • While the distribution and publication of Version 2 [of S] wasstill evolving, parallel research work was starting to shape the nextmajor version. At first this seemed to be a move away from S altogether:something called the “Quantitative Programming Environment” wasinitially a separate research project, aimed more explicitly atprogramming and trying to emphasize that users need not be statisticallysophisticated. By 1986, however, the decision was made to merge thiswork with the facilities (especially the graphics) underlying S, toproduce a new version of S. (This explains, by the way, why the mainprogram for S is called Sqpe, another one of those little puzzles forusers.)

    Unknown (#computing)

  • This computationally intensive operation [bootstrapping] is notone calculated to endear a user to a database administrator.

    Leland Wilkinson, The Grammar of Graphics, p.49.(#computing)

  • Most computer software is not yet intelligent enough to stop theuser doing something stupid. The old adage ‘Garbage In -> GarbageOut’ still hold good, and it must be realized that careful thought andclose inspection of the data are vital preliminaries to complicatedcomputer analysis.

    Christopher Chatfield, Problem solving : a statistician’sguide, 1988, p.59. (#computing)

  • Today…we have high-speed computers and prepackaged statisticalroutines to perform the necessary calculations…statistical software willno more make one a statistician than would a scalpel turn one into aneurosurgeon. Allowing these tools to do our thinking for us is a surerecipe for disaster.

    Good & Hardin, Common Errors in Statistics and How to AvoidThem, p.ix (#computing)

  • The generation of random numbers is too important to be left tochance.

    Robert R. Coveyou, Oak Ridge National Laboratory, 1969(#computing,random numbers)

  • Landon Noll…has been tinkering with random number generators fornearly a decade–an exercise in bringing order to chaos. “There’s a lotof beauty in chaos,” Noll says. “The Grand Canyon wouldn’t be so popularif it was just a uniform trench. The trick is controlling and managingchaos and turning it into something useful.”

    Tom McNichol, Wired, August, 2003, page 088.(#computing,random numbers)

  • Someone has characterized the user of stepwise regression as aperson who checks his or her brain at the entrance of the computercenter.

    D. R. Wittink, The application of regression analysis. NeedhamHeights, MA: Allyn and Bacon. p.259.(#computing)

  • The idea of optimization transfer is very appealing to me,especially since I have never succeeded in fully understanding the EMalgorithm.

    Andrew Gellman, Discussion, Journal of Computational andGraphical Statistics, vol 9, p 49.(#computing)

  • This reminds me of the duality in statistics between computationand model fit: better-fitting models tend to be easier to compute, andcomputational problems often signal modeling problems. and ‘The BlackSwan’. Law, Probability and Risk (2008) 7, 151-163.

    Andrew Gelman, Thoughts inspired by Nassim Taleb’s ‘Fooled byRandomness’ (#computing)

  • It is a nontrivial exercise to correctly program even thesimplest split-plot model using PROC MIXED.

    Jeremy Aldworth & Wherly P Hoffman, Split-Plot Model WithCovariate: A Cautionary Tale, The American Statistician, 56,284–289. (#computing)

  • Sometimes the most important fit statistic you can get is‘convergence not met’–it can tell you something is wrong with yourmodel.

    Oliver Schabenberger, 2006 Applied Statistics in AgricultureConference. (#computing)

  • It is obviously pointless to report or quote results to moredigits than is warranted. In fact, it is misleading or at the very leastunhelpful, because it fails to communicate to the reader anotherimportant aspect of the result–namely its reliability! A good rule(sometimes known as Ehrenberg’s rule) is to quote all digits up to andincluding the first two variable digits.

    Philipp K. Janert, Data Analysis with Open Source Tools,O’Reilly, 2010. (#computing)

  • Doubt is not a pleasant mental state, but certainty is aridiculous one.

    Voltaire (1694-1778)(#probability)

  • It is a part of probability that many improbable things willhappen.

    Agathon, 445 - 400 BC, Chance News 7.02(#probability)

  • Statistics make it clear the fact that one’s chances of beinghurt by a bear are far, far fewer than being struck by an auto almostanywhere, or being mugged on a city street, for that matter. We pursueour automotive, urban lives undaunted, often indifferent amid the policeand ambulance sirens, but in the Alaskan wilderness we lie awakeworrying about bears.

    John Kauffmann, Alaska’s Brooks Range(#probability)

  • Cougars can be dangerous, especially to unsupervised children,but the chances of becoming a cougar victim are far less than becoming avictim of lightning, honeybees, moose, deer, pit bulls, football,snow-shoveling, or crossing the street in front of your house. For somereason, we fear the true risks of being killed far less than the remoterisk of becoming prey.

    Dennis L Olsen, Cougars, page 46.(#probability)

  • Things which ought to be expected can seem quite extraordinary ifyou’ve got the wrong model.

    David Hand, Significance, 2014, 11, 36-39.(#data,models)

  • In many applications, the data analyst has a dilemma: Should aneffect be classified as [fixed] and a BLUE obtained, or as [random] anda BLUP obtained? The traditional distinction between fixed and randomeffects is not helpful; it may, in fact, lead the data analyst to choosethe less efficient route.

    Walter Stroup and D K Mulitze, Nearest Neighbor Adjusted BestLinear Unbiased Prediction, 1991, The American Statistician, 45,194–200. (#data,models)

  • What should be the distribution of random effects in a mixedmodel? I think Gaussian is just fine, unless you are trying to write ajournal paper.

    Terry Therneau, Speaking at useR 2007.(#data,models)

  • Competent scientists do not believe their own models or theories,but rather treat them as convenient fictions. …The issue to a scientistis not whether a model is true, but rather whether there is anotherwhose predictive power is enough better to justify movement from today’sfiction to a new one.

    Steve Vardeman, Comment, 1987, Journal of the AmericanStatistical Association, 82 : 130-131.(#data,models)

  • If you just rely on one model, you tend to amputate reality tomake it fit your model.

    David Brooks (#data,models)

  • Statistical models are sometimes misunderstood in epidemiology.Statistical models for data are never true. The question whether a modelis true is irrelevant. A more appropriate question is whether we obtainthe correct scientific conclusion if we pretend that the process understudy behaves according to a particular statistical model.

    Scott Zeger, Statistical reasoning in epidemiology, AmericanJournal of Epidemiology, 1991(#data,models)

  • It is not always convenient to remember that the right model fora population can fit a sample of data worse than a wrong model – even awrong model with fewer parameters. We cannot rely on statisticaldiagnostics to save us, especially with small samples. We must thinkabout what our models mean, regardless of fit, or we will promulgatenonsense.

    Leland Wilkinson, The Grammar of Graphics, p.335.(#data,models)

  • Fitting models to data is a bit like designing shirts to fitpeople. If you fit a shirt too closely to one particular person, it willfit other people poorly. Likewise, a model that fits a particular dataset too well might not fit other data sets well.

    Rahul Parsa, Speaking to the Iowa SAS User’s Group(#data,models)

  • You might say that there’s no reason to bother with modelchecking since all models are false anyway. I do believe that all modelsare false, but for me the purpose of model checking is not to accept orreject a model, but to reveal aspects of the data that are not capturedby the fitted model.

    Andrew Gelman, Some thoughts on the sociology of statistics,2007. (#data,models)

  • When evaluating a model, at least two broad standards arerelevant. One is whether the model is consistent with the data. Theother is whether the model is consistent with the ‘real world.’

    Kenneth Bollen, Structural Equations with Latent Variable(#data,models)

  • The point of a model is to get useful information about therelation between the response and predictor variables. Interpretabilityis a way of getting information. But a model does not have to be simpleto provide reliable information about the relation between predictor andresponse variables; neither does it have to be a data model. The goal isnot interpretability, but accurate information.

    Leo Breiman, Statistical Modeling: The Two Cultures,Statistical Science, Vol 16, p.210.(#data,models)

  • The goals in statistics are to use data to predict and to getinformation about the underlying data mechanism. Nowhere is it writtenon a stone tablet what kind of model should be used to solve problemsinvolving data. To make my position clear, I am not against models perse. In some situations they are the most appropriate way to solve theproblem. But the emphasis needs to be on the problem and on the data.Unfortunately, our field has a vested interest in models, come hell orhigh water.

    Leo Breiman, Statistical Modeling: The Two Cultures,Statistical Science, Vol 16, p.214.(#data,models,statistics)

  • Unless exploratory data analysis uncovers indications, usuallyquantitative ones, there is likely to nothing for confirmatory dataanalysis to consider.

    John Tukey, Exploratory Data Analysis, p.3.(#data,data analysis)

  • One thing the data analyst has to learn is how to expose himselfto what his data are willing–or even anxious–to tell him. Finding cluesrequires looking in the right places and with the right magnifyingglass.

    John Tukey, Exploratory Data Analysis, p.21.(#data,data analysis)

  • In data analysis, a plot of y against x may help us when we knownothing about the logical connection from x to y–even when we do notknow whether or not there is one–even when we know that such aconnection is impossible.

    John Tukey, Exploratory Data Analysis, p.131.(#data,data analysis)

  • Whatever the data, we can try to gain understanding bystraightening or by flattening. When we succeed in doing one or both, wealmost always see more clearly what is going on.

    John Tukey, Exploratory Data Analysis, p.148.(#data,data analysis)

  • When nearest neighbor effects exist, the randomized completeblock analysis [can be] so poor as to deserver to be calledcatastrophic. It [can not] even be considered a serious form ofanalysis. It is extremely important to make this clear to the vastnumber of researchers who have near religious faith in the randomizedcomplete block design.

    Walt Stroup & D Mulitze, Nearest Neighbor Adjusted BestLinear Unbiased Prediction, The American Statistician, 45, 194–200.(#data,data analysis)

  • There are two books devoted solely to principal components,Jackson (1991) and Jolliffe (1986), which we think overstates its valueas a technique.

    Venables & Ripley, Modern Applied Statistics with S, 4thed., page 305. (#data,data analysis)

  • Understanding the split-plot isn’t everything. It’s the onlything.

    Oliver Schabenberger, Speaking at JSM 2008.(#data,data analysis)

  • Residual analysis is similarly unreliable. In a discussion aftera presentation of residual analysis in a seminar at Berkeley in 1993,William Cleveland, one of the fathers of residual analysis, admittedthat it could not uncover lack of fit in more than four to fivedimensions. The papers I have read on using residual analysis to checklack of fit are confined to data sets with two or three variables. Withhigher dimensions, the interactions between the variables can producepassable residual plots for a variety of models. A residual plot is agoodness-of-fit test, and lacks power in more than a few dimensions. Anacceptable residual plot does not imply that the model is a good fit tothe data.

    Leo Breiman, Statistical Modeling: The Two Cultures,Statistical Science, Vol 16, p.203. (#data,dataanalysis)

  • I was profoundly disappointed when I saw that S-PLUS 4.5 nowprovides Type III sums of squares as a routine option for the summarymethod for aov objects. I note that it is not yet available formultistratum models, although this has all the hallmarks of an oversight(that is, a bug) rather than common sense seeing the light of day. Whenthe decision was being taken of whether to include this feature, becausethe FDA requires it a few of my colleagues and I were consulted and ourreply was unhesitatingly a clear and unequivocal No, but it seems theFDA and SAS speak louder and we were clearly outvoted.

    Bill Venables, Exegeses on Linear Models(#data,data analysis,lsmeans)

  • Some of us feel that type III sum of squares and so-calledLS-means are statistical nonsense which should have been left inSAS.

    Brian Ripley, Discussing features of S-Plus, S-news5.29.1999 (#data,dataanalysis,lsmeans)

  • I think it would be interesting to ask people who use the resultsfrom LSMEANS to explain what the results represent. My guess is thatless than 1% of the people who use LSMEANS know what they in factare.

    Doug Bates, R-help mailing list, 16 Oct 2003(#data,data analysis,lsmeans)

  • Doing applied statistics is never easy, especially if you want toget it right.

    Xiao-Li Meng, 2005 Joint Statistical Meetings(#data,data analysis)

  • I agree with the general message: “The right variables make a bigdifference for accuracy. Complex statistical methods, not so much.” Thisis similar to something Hal Stern told me once: the most importantaspect of a statistical analysis is not what you do with the data, it’swhat data you use.

    Andrew Gelman, The most important aspect of a statisticalanalysis is not what you do with the data, it’s what data you use,2018. (#data,data analysis)

  • Once upon a time, the phrase ‘statistical reduction of data’ wasused as a synonym for statistical analysis; it implied refining andconcentrating the data so as eventually to express the main features ina much smaller number of means, indices, coefficients. … Today somestatisticians and some computer programs seem more disposed to undertake‘statistical expansion of data’, perhaps with an original 96observations leading to 25 pages of output.

    D. J. Finney, Was This In Your Statistics Textbook: 1.Agricultural Scientist And Statistician, Experimental Agriculture, 24,153-161. (#data,data analysis)

  • Data analysis is a tricky business – a trickier business thaneven tricky data analysts sometimes think.

    Bert Gunter, S-news mailing list, 6 Jun 2002(#data,data analysis)

  • A first analysis of experimental results should, I believe,invariably be conducted using flexible data analyticaltechniques–looking at graphs and simple statistics–that so far aspossible allow the data to ‘speak for themselves’. The unexpectedphenomena that such a approach often uncovers can be of the greatestimportance in shaping and sometimes redirecting the course of an ongoinginvestigation.

    George Box, Signal to Noise Ratios, Performance Criteria, andTransformations, Technometrics, 30, 1–17 (#data,dataanalysis,eda)

  • When I was in graduate school, a fellow student who was writinghis dissertation with the late William G. Cochran passed along some ofCochran’s advice: You make a bigger contribution to statistics if youfind a workable solution to an important unsolved problem than if youfind an optimal solution where a good one already exists.

    Fred L. Ramsey and Daniel W. Schafer, The AmericanStatistician, 54, 78. (#data,dataanalysis)

  • The six degrees of freedom for error provided by the 4x4 Latinsquare have long been recognized as inadequate, at least by Fisher.Something of the order of 12 error degrees of freedom would appeardesirable…unless the effects under investigation are large in comparisonwith their experimental errors.

    Frank Yates, Complex Experiments, Supplement to the Journal ofthe Royal Statistical Society, 1935, Vol 2, No.1.(#data,data analysis)

  • The old rule of trusting the Central Limit Theorem if the samplesize is larger than 30 is just that–old. Bootstrap and permutationtesting let us more easily do inferences for a wider variety ofstatistics.

    Tim Hesterberg (#data,dataanalysis)

  • A competent data analysis of an even moderately complex set ofdata is a thing of trials and retreats, of dead ends and branches.

    John Tukey, Computer Science and Statistics: Proceedings of the14th Symposium on the Interface, p.4. (#data,dataanalysis)

  • Scrutiny [of data] should take in the names of variates.Analysis of variates y1 to y5 is not statistics; analysis of plantheight in centimeters, root weight in grams, etc., may be.

    D. A. Preece, In discussion of C. Chatfield, “The initialexamination of data”, Journal of the Royal Statistical Society. Series A(1985), p.234. (#data,data analysis)

  • To call in the statistician after the experiment is done may beno more than asking him to perform a postmortem examination: he may beable to say what the experiment died of.

    R.A. Fisher, Sankya, Indian Statistical Congress, Vol 4,p.17. (#data,data analysis)

  • It is clear that a statistician who is involved at the start ofan investigation, advises on data collection, and who knows thebackground and objectives, will generally make a better job of theanalysis than a statistician who was called in later on.

    Christopher Chatfield, Problem solving : a statistician’sguide, 1988, p.12. (#data,dataanalysis)

  • We really haven’t got any great amount of data on the subject,and without data how can we reach any definite conclusions?

    Thomas Alva Edison (1847-1931)(#data)

  • Small data…fits in memory on a laptop: <10 GB. Mediumdata…fits in memory on a server: 10 GB-1 TB. Big data…can’t fit inmemory on one computer: >1 TB.

    Hadley Wickham, Big Data Pipelines, 2015.(#data)

  • A massive data set is one for which the size,heterogeneity, and general complexity cause serious pain for theanalyst(s).

    J. Kettenring, Massive data sets…reflections on a workshop,Computing Science and Statistics, Proceedings of the 33rd Symposium onthe Interface, Vol 33, 2001. (#data)

  • The Dirty Data Theorem states that “real world” data tends tocome from bizarre and unspecifiable distributions of highly correlatedvariables and have unequal sample sizes, missing data points,non-independent observations, and an indeterminate number ofinaccurately recorded values.

    Unknown, Statistically Speaking, p.282.(#data)

  • The Titanic survival data seem to become to categorical dataanalysis what Fisher’s Iris data are to discriminant analysis.

    Andreas Buja, A Word from the Editor of JCGS, StatisticalComputing & Graphics Newsletter, V10, N1, p 32.(#data)

  • Consideration needs to be given to the most appropriate data tobe collected. Often the temptation is to collect too much data and notgive appropriate attention to the most important. Filing cabinets andcomputer files world-wide are filled with data that have been collectedbecause they may be of interest to someone in future. Most is never ofinterest to anyone and if it is, its existence is unknown to thoseseeking the information, who will set out to collect the data again,probably in a trial better designed for the purpose. In general, it isbest to collect only the data required to answer the questions posed,when setting up the trial, and plan another trial for other data in thefuture, if necessary.

    P. Portmann & H. Ketata, Statistical Methods for PlantVariety Evaluation, p.15. (#data)

  • We have found that some of the hardest errors to detect bytraditional methods are unsuspected gaps in the data collection (weusually discovered them serendipitously in the course of graphicalchecking).

    Peter Huber, Huge data sets, Compstat ’94: Proceedings,1994. (#data)

  • Every messy data is messy in its own way - it’s easy to definethe characteristics of a clean dataset (rows are observations, columnsare variables, columns contain values of consistent types). If you startto look at real life data you’ll see every way you can imagine databeing messy (and many that you can’t)!

    Hadley Wickham, R-help mailing list, 17 Jan 2008(#data)

  • What all practicing data analysts agree on is that the proportionof project time spent on data cleaning is huge. Estimates of 75-90percent have been suggested.

    Unknown, Graphics of Large Datasets, p.20.(#data)

  • That the ten digits do not occur with equal frequency must beevident to any one making much use of logarithmic tables, and noticinghow much faster the first pages wear out than the last ones.

    Simon Newcomb, Note on the frequencies of the different digitsin natural numbers, Amer. J. Math, 4, 39-40, 1881.(#data)

  • For a hundred years or so, mathematical statisticians have beenin love with the fact that the probability distribution of the sum of avery large number of very small random deviations always converges to anormal distribution. This infatuation tended to focus interest away fromthe fact that, for real data, the normal distribution is often ratherpoorly realized, if it is realized at all.

    Unknown, Numerical Recipes in C, p 520.(#data,normality)

  • I abhor averages. I like the individual case. A man may have sixmeals one day and none the next, making an average of three meals perday, but that is not a good way to live.

    Louis Brandeis(#data,averages)

  • The per capita gross national product of a nation…as a measure ofthe comfort of individual lives is about as apt, say, as deciding how todress in the morning according to the mean annual temperature of theregion in which one lives. If one lives in the tropics this would workwell. But if one lives in Minnesota, where the temperature might bethirty degrees below zero one morning and one hundred degrees above zeroanother morning, one would be in danger of dying of exposure or ofprostration most of the time. The problem with aggregate statistics isthat they obscure both the extremes and patterns of distribution.

    Paul Gruchow, Grass Roots, p.44.(#data,averages)

  • In former times, when the hazards of sea voyages were much moreserious than they are today, when ships buffeted by storms threw aportion of their cargo overboard, it was recognized that those whosegoods were sacrificed had a claim in equity to indemnification at theexpense of those whose goods were safely delivered. The value of thelost goods was paid for by agreement between all of those whosemerchandise had been in the same ship. This sea damage to cargo intransit was known as ‘havaria’ and the word came naturally to be appliedto the compensation money which each individual was called upon to pay.From this Latin word derives our modern word ‘average’.

    M. J. Moroney, Facts from Figures, p.34.(#data,averages)

  • Some people hate the very name of statistics, but I find themfull of beauty and interest. Whenever they are not brutalised, butdelicately handled by the higher methods, and are warily interpreted,their power of dealing with complicated phenomena is extraordinary. Theyare the only tools by which an opening can be cut through the formidablethicket of difficulties that bars the path of those who pursue theScience of man.

    Frances Galton, Natural Inheritance.(#statistics,science)

  • The central limit theorem is often used to justify the assumptionof normality when using the sample mean and the the sample standarddeviation. But it is inevitable that real data contain gross errors.Five to ten percent unusual values in a dataset seem to be the rulerather than the exception (Hampel 1973). The distribution of such datais no longer Normal.

    A. S. Hedayat and Guoqin Su, Robustness of the SimultaneousEstimators of Location and Scale From Approximating a Histogram by aNormal Density Curve, The American Statistician, 2012, 66, p.25.(#data,outliers,normality)

  • Why is a particular record or measurement classed as an outlier?Among all who handle and interpret statistical data, the word has longbeen in common use as an epithet for any item among a dataset of N thatdeparts markedly from the broad pattern of the set.

    David Finney, Calibration Guidelines Challenge OutlierPractices, The American Statistician, 2006, Vol 60, No 4, p.310.(#data,outliers)

  • Dodge (2003) provided a definition of ‘outlier’ that is helpfulbut far from complete: In a sample of N observations, it is possible fora limited number to be so far separated in value from the remainder thatthey give rise to the question whether they are not from a differentpopulation, or that the sampling technique is at fault. Such values arecalled outliers.

    David Finney, Calibration Guidelines Challenge OutlierPractices, The American Statistician, 2006, Vol 60, No 4, p.310.(#data,outliers)

  • The finding of an outlier is not necessarily a discovery of a bador misleading datum that may contaminate the data, but it may amount toa comment on the validity of distributional assumptions inherent in theform of analysis that is contemplated.

    David Finney, Calibration Guidelines Challenge OutlierPractices, The American Statistician, 2006, Vol 60, No 4, p.312.(#data,outliers)

  • If any observation has been classed as an outlier, the next stepshould be if possible to infer the cause…attention should be given tothe possibility that laboratory and data management techniques have beenimperfect: improvements and safeguards for the future should beconsidered.

    David Finney, Calibration Guidelines Challenge OutlierPractices, The American Statistician, 2006, Vol 60, No 4, p.312.(#data,outliers)

  • The motivation for any action on outliers must be to improveinterpretation of data without ignoring unwelcome truth. To remove badand untrustworthy data is a laudable ambition, but naive and untestedrules may bring harm rather than benefit.

    David Finney, Calibration Guidelines Challenge OutlierPractices, The American Statistician, 2006, Vol 60, No 4, p.312.(#data,outliers)

  • One cautious approach is represented by Bernoulli’s moreconservative outlook. If there are very strong reasons for believingthat an observation has suffered an accident that made the value in thedata-file thoroughly untrustworthy, then reject it; in the absence ofclear evidence that an observation, identified by formal rule as anoutlier, is unacceptable then retain it unless there is lack of trustthat the laboratory obtaining it is conscientiously operated by ablepersons who have “…taken every care.”

    David Finney, Calibration Guidelines Challenge OutlierPractices, The American Statistician, 2006, Vol 60, No 4, p.313.(#data,outliers)

  • Treat outliers like children. Correct them when necessary, butnever throw them out.

    Unknown., Top 12 Tip #2. Practical Stats’ Applied EnvironmentalStatistics course (#data,outliers)

  • There are a lot of statistical methods looking at whether anoutlier should be deleted…I don’t endorse any of them.

    Barry Nussbaum, Significance, Apr 2017.(#data,outliers)

  • All this discussion of deleting the outliers is completelybackwards. In my work, I usually throw away all the good data, and justanalyze the outliers.

    Unknown pharmaceutical statistician, The American Statistician,Vol 61, No 3, page 193.(#data,outliers)

  • I have often thought that outliers contain more information thanthe model.

    Arnold Goodman, 2005 Joint Statistical Meetings(#data,outliers)

  • Whatever actually happened, outliers need to be investigated notomitted. Try to understand what caused some observations to be differentfrom the bulk of the observations. If you understand the reasons, youare then in a better position to judge whether the points canlegitimately removed from the data set, or whether you’ve justdiscovered something new and interesting. Never remove a point justbecause it is weird.

    Rob J. Hyndman, Omitting outliers, 2016(#data,outliers)

  • Scholars feel the need to present tables of model parameters inacademic articles (perhaps just as evidence that they ran the analysisthey claimed to have run), but these tables are rarely interpreted otherthan for their sign and statistical significance. Most of the numbers inthese tables are never even discussed in the text. From the perspectiveof the applied data analyst, R packages without procedures to computequantities of scientific interest are woefully incomplete. A betterapproach focuses on quantities of direct scientific interest rather thanuninterpretable model parameters. … For each quantity of interest, theuser needs some summary that includes a point estimate and a measure ofuncertainty such as a standard error, confidence interval, or adistribution. The methods of calculating these differ greatly acrosstheories of inference and methods of analysis. However, from the user’sperspective, the result is almost always the same: the point estimateand uncertainty of some quantity of interest.

    Kousuke Imai, Gary King, Oliva Lau, Toward a Common Frameworkfor Statistical Analysis and Development, Journal of Computational andGraphical Statistics, 2008, v 17.(#data,tables,uncertainty)

  • The purpose of plotting is to convey phenomena to the viewer’scortex, not to provide a place to lookup observed numbers.

    Kaye Basford, John Tukey, Graphical Analysis of Multi-ResponseData, p.373. (#data visualization)

  • Had we started with this [quantile] plot, noticed that it looksstraight and not looked further, we would have missed the importantfeatures of the data. The general lesson is important. Theoreticalquantile-quantile plots are not a panacea and must be used inconjunction with other displays and analyses to get a full picture ofthe behavior of the data.

    John M. Chambers, William S. Cleveland, Beat Kleiner, Paul A.Tukey, Graphical Methods for Data Analysis, p.212. (#datavisualization)

  • Visualization for large data is an oxymoron–the art is to reducesize before one visualizes. The contradiction (and challenge) is that wemay need to visualize first in order to find out how to reduce size.

    Peter Huber, Massive datasets workshop: Four years after,Journal of Computational and Graphical Statistics, Vol 8, 635–652.(#data visualization)

  • Pie charts have severe perceptual problems… If you want todisplay some data, and perceiving the information is not so important,then a pie chart is fine.

    Unknown, S-Plus 2000 Programmer’s Guide, p.349.(#data visualization)

  • Merely drawing a plot does not constitute visualization.Visualization is about conveying important information to the readeraccurately. It should reveal information that is in the data and shouldnot impose structure on the data.

    W. Huber, X. Li, and R. Gentleman, Bioinformatics andComputational Biology Solutions using R and Bioconductor, p.162.(#data visualization)

  • While the dendrogram has been widely used to represent distancesbetween objects, it cannot really be considered to be a visualizationmethod. Dendrograms do not necessarily expose structure that exists inthe data. In many cases they impose structure on the data, and when thatis the case it is dangerous to interpret the observed structure.

    W. Huber, X. Li, and R. Gentleman, Bioinformatics andComputational Biology Solutions using R and Bioconductor, p.170.(#data visualization)

  • [When] you see excellent graphics, find out how they were done.Borrow strength from demonstrated excellence. The idea for informationdesign is: Don’t get it original, get it right.

    Edward Tufte (#datavisualization,design)

  • Graphical excellence is that which gives to the viewer thegreatest number of ideas in the shortest time with the least ink in thesmallest space.

    Edward Tufte, The Visual Display of Quantitative Information,1983. (#data visualization)

  • Chartjunk does not achieve the goals of its propagators. Theoverwhelming fact of data graphics is that they stand or fall on theircontent, gracefully displayed. Graphics do not become attractive andinteresting through the addition of ornamental hatching and falseperspective to a few bars.

    Edward Tufte, The Visual Display of Quantitative Information,1983, p.121. (#data visualization)

  • A table is nearly always better than a dumb pie chart; the onlyworse design than a pie chart is several of them, for then the viewer isasked to compare quantities located in spatial disarray both within andbetween pies…Given their low data-density and failure to order numbersalong a visual dimension, pie charts should never be used. Above allelse show the data.

    Edward Tufte, The Visual Display of Quantitative Information,1983, p.178. (#data visualization)

  • In medical research, too often the first published study testinga new treatment provides the strongest evidence that will ever be foundfor that treatment. As better controlled studies–less vulnerable to theenthusiasms of researchers and their sponsors–are then conducted, thetreatment’s reported efficacy declines. Years after the initial study[…] sometimes the only remaining issue is whether the treatment is infact harmful.

    Edward Tufte, Beautiful Evidence, p.144.(#statistics,significance)

  • The preliminary examination of most data is facilitated by theuse of diagrams. Diagrams prove nothing, but bring outstanding featuresreadily to the eye; they are therefore no substitutes for such criticaltests as may be applied to the data, but are valuable in suggesting suchtests, and in explaining the conclusions founded upon them.

    Ronald A Fisher, Statistical Methods for Research Workers,p.27. (#data visualization)

  • Our statistical puritanism may incline us not to use shadows, butwe confess that a little bit of shadow is fun.

    Dan Carr, Using Layering and Perceptual Grouping in StatisticalGraphics, Statistical Computing & Graphics Newsletter, V. 10, N. 1,p.25. (#data visualization)

  • We are not saying that the primary purpose of a graph is toconvey numbers with as many decimal places as possible. We agree withEhrenberg (1975) that if this were the only goal, tables would bebetter. The power of a graph is its ability to enable one to take in thequantitative information, organize it, and see patterns and structurenot readily revealed by other means of studying the data.

    William Cleveland & Robert McGill, Graphical Perception:Theory, Experimentation, and Application to the Development of GraphicalModels, Journal of the American Statistical Association, 79, 531-554,1984. (#data visualization)

  • There was a controversy [in the 1920s]…about whether the dividedbar chart or the pie chart was superior for portraying the parts of awhole. The contest appears to have ended in a draw. We conclude thatneither graphical form should be used because other methods aredemonstrably better.

    William Cleveland & Robert McGill, Graphical Perception:Theory, Experimentation, and Application to the Development of GraphicalModels, Journal of the American Statistical Association, 79, 531-554,1984. (#data visualization)

  • Our conclusion about [choropleth] patch maps agrees with Tukey’s(1979), who left little doubt about his opinions by stating, ‘I amcoming to be less and less satisfied with the set of maps that somedignify by the name statistical map and that I would gladlyrevile with the name patch map’.

    William Cleveland & Robert McGill, Graphical Perception:Theory, Experimentation, and Application to the Development of GraphicalModels, Journal of the American Statistical Association, 79, 531–554,1984. (#data visualization)

  • There is no more reason to expect one graph to ‘tell all’ than toexpect one number to do the same.

    John Tukey, Exploratory Data Analysis. (#datavisualization)

  • There is no excuse for failing to plot and look.

    John Tukey, Exploratory Data Analysis (#datavisualization)

  • It’s generally considered bad practice to use more than sixcolors in a single display.

    Ross Ihaka, R-help mailing list, 2004 (#datavisualization)

  • The mere multiplicity of the attempts to deal with more thanthree continuous dimensions by encoding additional variables intoglyphs, Chernoff faces, stars, Kleiner-Hartigan trees, and so onindicates that each of them has met only with rather limitedsuccess.

    Peter Huber, Statistical graphics: history and overview,Proceedings of the Fourth Annual Conference and Exposition, p.674.(#data visualization)

  • Spatial patterns may be due to many sources of variation. In thecontext of seeking explanations, John Tukey said that, “the unadjustedplot should not be made.” In other words, our perceptual/cognitiveabilities are poor in terms of adjusting for known source of variationsand envisioning the resulting map. A better strategy is to control forknown sources of variation and/or adjust the estimates before making themap.

    Dan Carr, Survey Research Methods Section newsletter, July2002. (#data visualization)

  • It’s not easy to select more than a few clearly distinct colors.Also, “distinct” is context-dependent, because: What will be the spatialrelationships of the different colors in your output? You cansuccessfully have fairly similar colors adjacent to each other, sincethe contrast is more obvious when they’re adjacent. However, if you wantto use colors to track identity and difference across scattered pointsor patches, then you need bigger separations between colors, since youwant to be able to see easily that patch “A” here is of the same kind aspatch “A” there and different from patch “B” somewhere else, whenmingled with patches of other kinds. And size matters. Big patches ofsimilar color (as on a map) can look quite distinct, while the samecolors used to plot filled circular blobs on a graph might be barelydistinguishable, and totally indistinguishable if used to plot colored“.”s or “+”s. … It’s all very psycho-visual and success usually requiresexperimentation!

    Ted Harding, R-help mailing list, 2004 (#datavisualization)

  • The concept of randomness arises partly from games of chance. Theword ‘chance’ derives from the Latin cadentia signifying thefall of a die. The word ‘random’ itself comes from the Frenchrandir meaning to run fast or gallop.

    G. Spencer Brown, Probability and Scientific Inference, ChapterVII, p.35. (#history)

  • Statistics derives from a German term, ‘Statistik’, first used asa substantive by the Gottingen professor Gottfried Achenwall in1749.

    Theodore M. Porter, The Rise of Statistical Thinking1820-1900. (#history)

  • Strangely, the motto chosen by the founders of the StatisticalSociety in 1834 was ‘Aliis exterendum’, which means ‘Let other thrash itout.’ William Cochran confessed that ‘it is a little embarrassing thatstatisticians started out by proclaiming what they will not do’.

    Edmund A. Gehan and Noreen A. Lemak, Statistics in MedicalResearch: Developments in Clinical Trials(#history)

  • What accounts for the success of the [Iowa State] Stat Lab? Ibelieve that it is because it was not driven by the mathematics, but byactual problems in biology, genetics, demography, economics, psychology,and so on. To be sure, a real problems give rise to abstract problems instatistical inference which have a fascination of their own. However,for statistics to remain viable, statistical problems should have theirgenesis in real, data-related problems.

    Oscar Kempthorne, A conversation with Oscar Kempthorne,Statistical Science, 1995, V 10, p.335.(#history)

  • You prepare yourself to win. You prepare yourself for thepossibility that you won’t win. You don’t really prepare yourself forthe possibility that you flip the coin in the air and it lands on itsedge and you get neither outcome.

    Al Gore, On the 2004 presidential election, Chance News10.01. (#history)

  • The invalid assumption that correlation implies cause is probablyamong the two or three most serious and common errors of humanreasoning.

    Stephen Jay Gould, The Mismeasure of Man(#statistics)

  • When noise is correlated it becomes music.

    Anindya Roy, Personal communication(#statistics)

  • As I left consulting to go back to the university, these were theperceptions I had about working with data to find answers to problems:(a) Focus on finding a good solution–that’s what consultants get paidfor. (b) Live with the data before you plunge into modelling. (c) Searchfor a model that gives a good solution, either algorithmic or data. (d)Predictive accuracy on test sets is the criterion for how good the modelis. (e) Computers are an indispensable partner.

    Leo Breiman, Statistical Modeling: The Two Cultures,Statistical Science, Vol. 16, p.201.(#statistics)

  • I have always thought that statistical design and sampling frompopulations should be the first courses taught, but all elementarycourses I know of start with statistical methods or probability. To me,this is putting the cart before the horse!

    Walter Federer, A Conversation with Walter T Federer,Statistical Science, 2005, Vol 20, p.312.(#statistics)

  • Bill Hunter told me that their editor wanted a title for theirbook with sex appeal. Thus, “Statistics for Experimenters”, which ispretty subliminal but it’s there.

    Robert Easterling, The American Statistician, v 58, p 248.(#statistics)

  • The only useful function of a statistician is to makepredictions, and thus to provide a basis for action.

    W. Edwards Deming, W. A. Wallis, 1980. The Statistical ResearchGroup. Journal of the American Statistical Association, 75, 321.(#statistics)

  • We must watch our own language. For example, “Type I error” and“Type II error” are meaningless and misleading terms. Instead, try“chance of a false alarm” and “a missed opportunity”.

    Deborah J. Rumsey, Assessing Student Retention of EssentialStatistical Ideas: Perspectives, Priorities, and Possibilities, AmericanStatistician, Vol 62, No 1, p.58.(#statistics)

  • Statistics prove near & far that folks who drive likecrazy…are.

    Anon, Burma Shave sign in the Advertising Museum inPortland (#statistics)

  • It’s a random pattern. That’s the pattern.

    Ed Chigliak, TV series “Northern Exposure”(#statistics,random numbers)

  • Randomness is NOT the absence of a pattern.

    Bill Venables, 1999 S-Plus User’s Conference(#statistics,random numbers)

  • The statistics on sanity are that one out of every four Americansis suffering from some form of mental illness. Think of your three bestfriends. If they are okay, then it’s you.

    Rita Mae Brown

  • It is proven that the celebration of birthdays is healthy.Statistics show that those people who celebrate the most birthdaysbecome the oldest.

    S. den Hartog, Ph D. Thesis University of Groningen(#statistics)

  • Because no one becomes statistically self-sufficient after onesemester of study, I try to prepare students to become intelligentconsumers of the assistance that they will inevitably seek. Servicecourses train future clients, not future statisticians.

    Michael W. Tosset, “Statistical Science”, Feb 98, p.24.(#statistics)

  • If there was ever an idea in statistics which evokes thereaction, “Why the hell didn’t I think of that,” it has to be thebootstrap.

    James R. Thompson, 1997 Interface Proceedings(#statistics)

  • The sensible statistician should be wary of other people’sstatistics. In particular, it is unwise to believe all officialstatistics, such as government statements that the probability of anuclear meltdown is only one in 10,000 years (remember Chernobyl!).

    Christopher Chatfield, Problem solving: a statistician’s guide,1988, p 73. (#statistics)

  • The government are very keen on amassing statistics. They collectthem, add them, raise them to the n-th power, take the cube root andprepare wonderful diagrams. But you must never forget that every one ofthese figures comes in the first instance from the village watchman, whojust puts down what he damn pleases.

    English judge on the subject of Indian statistics, Quoted inSir Josiah Stamp in Some Economic Matters in Modern Life, London: Kingand Sons, 1929, pp.258-259.(#statistics)

  • It was always important for the biometrician to take part in thefield-work for most kinds of trials. A willingness to get our handsdirty did much to dispel the distrust of the theoretician from HeadOffice, as well as giving us an appreciation of the practical problems.Trials were always carried out to simulate farming conditions as much aspossible. We had once advocated a change in wheat plot lengths from 2chains to 3, based on results from uniformity trials. It seemed a verygood idea until a biometrician went to help harvest a very good crop andfound he had to lift and carry bags of over 100 lb, this being the yieldfrom each plot. But the local agriculturist in charge of the trial wouldnever have reported this; he would only have gone on grumbling aboutthose ‘theory guys’ in Head Office forever.

    Jean Heywood (nee Miller), A History of Statistics in NewZealand, edited by H.S.Roberts. p.23-24. (#statistics,biometry, expt design)

  • Econometrics has successfully predicted 14 of the last 3 economicdepressions.

    David Hand, Speaking at Interface 2000.(#statistics)

  • We feel that nothing can replace the value to a [corn] breeder ofcareful study and understanding of his plants…More and more, we feelthat grave danger exists of statistics being used as a substitute forcritical observation and thought…Statistics have their place, a veryimportant one, but they can never serve as a substitute for closeassociation with plants. Their real value, it seems to us, is inmeasuring precisely what we already know in a general way. Statisticstends to be an office art based on machines and figures rather than afield art based on living things.

    Henry A. Wallace and William L. Brown, Corn and Its EarlyFathers, 1956, p.123. (#statistics)

  • The great scientific weakness of America today is that she tendsto emphasize quantity at the expense of quality–statistics instead ofgenuine insight–immediate utilitarian application instead of genuinethought about fundamentals.

    Henry A. Wallace and William L. Brown, Corn and Its EarlyFathers, 1956, p.124. (#statistics)

  • It is easy to lie with statistics, but it is easier to liewithout them.

    Frederick Mosteller(#statistics)

  • There are aspects of statistics other than it beingintellectually difficult that are barriers to learning. For one thing,statistics does not benefit from a glamorous image that motivatesstudents to persist through tedious and frustrating lessons…there are noTV dramas with a good-looking statistician playing the lead, and fewmothers’ chests swell with pride as they introduce their son or daughteras “the statistician.”

    Chap T. Le and James R. Boen, Health and Numbers: BasicStatistical Methods (#statistics)

  • At its core statistics is not about cleverness and technique, butrather about honesty. Its real contribution to society isprimarily moral, not technical. It is about doing the rightthing when interpreting empirical information. Statisticians arenot the world’s best computer scientists, mathematicians, orscientific subject matter specialists. We are (potentially, atleast) the best at the principled collection, summarization,and analysis of data.

    Stephen B. Vardeman and Max D. Morris, Statistics and Ethics:Some Advice for Young Statisticians, The American Statistician, vol 57,p.21. (#statistics,ethics)

  • Statistical analysis of data can only be performed within thecontext of selected assumptions, models, and/or prior distributions. Astatistical analysis is actually the extraction of substantiveinformation from data and assumptions. And herein lies the rub,understood well by Disraeli and others skeptical of our work: For givendata, an analysis can usually be selected which will result in“information” more favorable to the owner of the analysis then isobjectively warranted.

    Stephen B. Vardeman and Max D. Morris, Statistics and Ethics:Some Advice for Young Statisticians, The American Statistician, vol 57,p.25. (#statistics,ethics)

  • Too much of what all statisticians do … is blatantly subjectivefor any of us to kid ourselves or the users of our technology intobelieving that we have operated ‘impartially’ in any true sense. … Wecan do what seems to us most appropriate, but we can not be objectiveand would do well to avoid language that hints to the contrary.

    Steve Vardeman, Comment, 1987, Journal of the AmericanStatistical Association, 82, 130-131.(#statistics)

  • The standard error of most statistics is proportional to 1 overthe square root of the sample size. God did this, and there is nothingwe can do to change it.

    Howard Wainer, Improving Tabular Displays, With NAEP Tables asExamples and Inspirations, Journal of Educational and BehavioralStatistics, Vol 22, No.1, pp.1-30.(#statistics)

  • Suppose that Sir R. A. Fisher–a master of public relations–hadnot taken over from ordinary English such evocative words as“sufficient”, “efficient”, and “consistent” and made them into preciselydefines terms of statistical theory. He might, after all, have usedutterly dull terms for those properties of estimators, calling themcharacteristics A, B, and C. … Would his work have had the same smashinginfluence that it did? I think not, or at least not as rapidly.

    William H. Kruskal, Formulas, Numbers, Words: Statistics inProse, The American Scholar, 1978(#statistics)

  • Statistics state the status of the state.

    Leland Wilkinson, The Grammar of Graphics, p.165.(#statistics)

  • My philosophy on lotteries is that while you actually have to buya ticket in order to win the lottery, buying a ticket does notsignificantly increase your odds of winning.

    Howie Smith, Prsonal communication(#statistics)

  • If you show your friends your confidence interval for thestandard error of the estimated length of the confidence interval ofyour confidence about yourself, I guess one nice thing to ask to freakthem out is: “Can you construct a confidence interval for the confidencelevel of my confidence?

    Tony Baiching, Personal communication(#statistics)

  • To make the preliminary test on variances is rather like puttingto sea in a rowing boat to find out whether conditions are sufficientlycalm for an ocean liner to leave port.

    George E. P. Box, Non-normality and Tests on Variances,Biometrika, 40, 318-335.(#statistics,nhst)

  • Statistics is, or should be, about scientific investigation andhow to do it better, but many statisticians believe it is a branch ofmathematics.

    George Box, AmStat News, Oct 2000, page 11.(#statistics,Box quotes)

  • These days the statistician is often asked such questions as “Areyou a Bayesian?” “Are you a frequentist?” “Are you a data analyst?” “Areyou a designer of experiments?”. I will argue that the appropriateanswer to ALL of these questions can be (and preferably should be)“yes”, and that we can see why this is so if we consider the scientificcontext for what statisticians do.

    George E.P. Box (#statistics)

  • One is so much less than two. [John Tukey’s eulogy of hiswife.]

    John Tukey, The life and professional contributions of John W.Tukey, The Annals of Statistics, 2001, Vol 30, p.46.(#statistics)

  • Statisticians classically asked the wrong question–and werewilling to answer with a lie, one that was often a downright lie. Theyasked “Are the effects of A and B different?” and they were willing toanswer “no”. All we know about the world teaches us that the effects ofA and B are always different–in some decimal place–for every A and B.Thus asking “Are the effects different?” is foolish. What we should beanswering first is “Can we tell the direction in which the effects of Adiffer from the effects of B?” In other words, can we be confident aboutthe direction from A to B? Is it “up”, “down” or “uncertain”?

    John Tukey, The Philosophy of Multiple Comparisons, StatisticalScience, 6, 100-116. (#statistics)

  • No one has ever shown that he or she had a free lunch. Here, ofcourse, “free lunch” means “usefulness of a model that is locally easyto make inferences from”.

    John Tukey, Issues relevant to an honest account of data-basedinference, partially in the light of Laurie Davies’ paper.(#statistics)

  • If asymptotics are of any real value, it must be because theyteach us something useful in finite samples. I wish I knew how to besure when this happens.

    John Tukey, Issues relevant to an honest account of data-basedinference, partially in the light of Laurie Davies’ paper.(#statistics)

  • George Box: We don’t need robust methods. A good statistician(particularly a Bayesian one) will model the data well and find theoutliers. John Tukey: They ran over 2000 statistical analyses atRothamsted last week and nobody noticed anything. A red light warningwould be most helpful.

    George Box vs.John Tukey, Douglas Martin, 1999 S-PlusConference Proceedings. (#statistics)

  • Statistics is a science in my opinion, and it is no more a branchof mathematics than are physics, chemistry, and economics; for if itsmethods fail the test of experience–not the test of logic–they will bediscarded.

    John Tukey, The life and professional contributions of John W.Tukey, by David Brillinger, The Annals of Statistics, 2001, Vol 30.(#statistics)

  • One Christmas Tukey gave his students books of crossword puzzlesas presents. Upon examining the books the students found that Tukey hadremoved the puzzle answers and had replaced them with words of thesense: “Doing statistics is like doing crosswords except that one cannotknow for sure whether one has found the solution.”

    John Tukey, The life and professional contributions of John W.Tukey, by David Brillinger, The Annals of Statistics, 2001, Vol 30,p.22. (#statistics)

  • A sort of question that is inevitable is: “Someone taught mystudents exploratory, and now (boo hoo) they want me to tell them how toassess significance or confidence for all these unusual functions of thedata. Oh, what can we do?” To this there is an easy answer: TEACH themthe JACKKNIFE.

    John Tukey, We Need Both Exploratory and Confirmatory, TheAmerican Statistician, Vol 34, No 1, p.25.(#statistics)

  • John Tukey’s eye for detail was amazing. When we were preparingsome of the material for our book (which was published last year), itwas most disconcerting to have him glance at the data and question onevalue out of several thousand points. Of course, he was correct and Ihad missed identifying this anomaly.

    Kaye Basford (#statistics)

  • Many students are curious about the ‘1.5 x IQR Rule’;, i.e.whydo we use Q1 - 1.5 x IQR (or Q3 + 1.5 x IQR) as the value for decidingif a data value is classified as an outlier? Paul Velleman, astatistician at Cornell University, was a student of John Tukey, whoinvented the boxplot and the 1.5 x IQR Rule. When he asked Tukey, ‘Why1.5?’, Tukey answered, ‘Because 1 is too small and 2 is too large.’[Assuming a Gaussian distribution, about 1 value in 100 would be anoutlier. Using 2 x IQR would lead to 1 value in 1000 being anoutlier.]

    Unknown (#statistics)

  • It is a rare thing that a specific body of data tells us asclearly as we would wish how it itself should be analyzed.

    John Tukey, Exploratory Data Analysis, p.397.(#statistics)

  • Just which robust/resistant methods you use is not important–whatis important is that you use some. It is perfectly proper to use bothclassical and robust/resistant methods routinely, and only worry whenthey differ enough to matter. But, when they differ, you should thinkhard.

    John Tukey, Quoted by Doug Martin(#statistics)

  • We thus echo the classical Bayesian literature in concluding that‘noninformative prior information’ is a contradiction in terms. The flatprior carries information just like any other; it represents theassumption that the effect is likely to be large. This is often nottrue. Indeed, the signal-to-noise ratio s is often very low and then itis necessary to shrink the unbiased estimate. Failure to do so byinappropriately using the flat prior causes overestimation of effectsand subsequent failure to replicate them.

    Erik van Zwet & Andrew Gelman, A proposal for informativedefault priors scaled by the standard error of estimates, The AmericanStatistician, 76, p.7.(#statistics,bayesian)

  • Another reason for the applied statistician to care aboutBayesian inference is that consumers of statistical answers, at leastinterval estimates, commonly interpret them as probability statementsabout the possible values of parameters. Consequently, the answersstatisticians provide to consumers should be capable of beinginterpreted as approximate Bayesian statements.

    Donald B. Rubin, Bayesianly justifiable and relevant frequencycalculations for the applied statistician. Annals of Statistics,12(4):1151-1172, 1984.(#statistics,bayesian)

  • In most cases the frequentist adopts numerical values becausethey are convenient in that the calculations can be easily performed.For instance, a reliability engineer will use an exponentialdistribution or, if that is too gross, a Weibull. In the majority offrequentist analyses there is little justification for the assumedlikelihood, and it is as subjective as any prior.

    D. V. Lindley, Discussion, The American Statistician, August1997, Vol. 51, page 265.(#statistics,bayesian)

  • In contrast to the logical development and intuitiveinterpretations of the Bayesian approach, frequentist methods are nearlyimpossible to understand, even for the best students. Considerconfidence intervals. Many instructors err in describing confidenceintervals and even some texts err. But whether texts or instructors errin explaining them, students do not understand them. And they carry thismisunderstanding with them into later life. Calculating a confidenceinterval is easy. But everyone except the cognoscenti believes that whenone calculates 95% confidence limits of 2.6 and 7.9, say, theprobability is 95% that the parameter in question lies in the intervalfrom 2.6 to 7.9. P values are nearly as obscure as confidence intervals.… Students in frequentist courses may learn very well how to calculateconfidence intervals and P values, but they cannot give them correctinterpretations. I stopped teaching frequentist methods when I decidedthat they could not be learned.

    Donald A. Berry, Teaching Elementary Bayesian Statistics withReal Applications in Science, The American Statistician, 51, p 242.(#statistics,bayesian)

  • Uniform priors on probabilities are ubiquitous. I agree that theycan be useful. However, if the probability in question is the prevalenceof HIV in California, it is ridiculous to assert that a prevalence of100% is equally plausible with 0%, or that the chance that it is above50% is the same as the chance that it is below 50%. It is particularlynoxious to call such a prior noninformative. Instead, it isdisinformative. The likelihood notwith standing, positing a uniformprior for the probability that the sun will rise tomorrow is equallyridiculous, given that we are all confident that it has risen all thedays of our lives.

    Wesley O. Johnson, Comment: Bayesian Statistics in the TwentyFirst Century, The American Statistician, Feb 2013, 67, p 10.(#statistics,bayesian)

  • The best way to convey to the experimenter what the data tell himabout theta is to show him a picture of the posterior distribution.

    George E. P. Box & G. C. Tiao, Bayesian Inference inStatistical Analysis (1973)(#statistics,bayesian)

  • If one could get some rational basis for obtaining the prior,then there would be no problem. But people have seminars these daysabout something where someone says, ‘I am going to use such and such aprior’. Where does he get the prior? It is not data based. It is amathematical convenience or something like that. It is not even obtainedby using Bayes’ theorem. Why one should believe the outcome of usingthis seems to be a very moot point.

    Oscar Kempthorne, A conversation with Oscar Kempthorne,Statistical Science, 1995, V 10, p.333.(#statistics,bayesian)

  • In the design of experiments, one has to use some informal priorknowledge. How does one construct blocks in a block design problem forinstance? It is stupid to think that use is not made of a prior. Butknowing that this prior is utterly casual, it seems ludicrous to gothrough a lot of integration, etc., to obtain ‘exact’ posteriorprobabilities resulting from this prior. So, I believe the situationwith respect to Bayesian inference and with respect to inference, ingeneral, has not made progress. Well, Bayesian statistics has led to agreat deal of theoretical research. But I don’t see any realutilizations in applications, you know. Now no one, as far as I know,has examined the question of whether the inferences that are obtainedare, in fact, realized in the predictions that they are used tomake.

    Oscar Kempthorne, “A conversation with Oscar Kempthorne”,Statistical Science, 1995, V 10, p.334.(#statistics,bayesian)

  • I sometimes think that the only real difference between Bayesianand non-Bayesian hierarchical modelling is whether random effects arelabeled with Greek or Roman letters.

    Peter Diggle, Comment on Bayesian analysis of agriculturalfield experiments, 1999, J. Royal Statistical Society B, 61,691–746. (#statistics,bayesian)

  • The practicing Bayesian is well advised to become friends with asmany numerical analysts as possible.

    James Berger, Statistical Decision Theory and BayesianAnalysis, p.202.(#statistics,bayesian)

  • You just say “Bayesian,” and people think you are some kind ofgenius.

    Gary Churchill, Bayes offers a new way to make sense ofnumbers, Science, 19 Nov 1999.(#statistics,bayesian)

  • Bayesian computations give you a straightforward answer you canunderstand and use. It says there is an X% probability that yourhypothesis is true-not that there is some convoluted chance that if youassume the null hypothesis is true, you’ll get a similar or more extremeresult if you repeated your experiment thousands of times. How does oneinterpret THAT!

    Steven Goodman, Bayes offers a new way to make sense ofnumbers, Science, 19 Nov 1999.(#statistics,bayesian)

  • Bayesian methods are complicated enough, that giving researchersuser-friendly software could be like handing a loaded gun to a toddler;if the data is crap, you won’t get anything out of it regardless of yourpolitical bent.

    Brad Carlin, Bayes offers a new way to make sense of numbers,Science, 19 Nov 1999.(#statistics,bayesian)

  • If a study, even a statistically significant one, suggests thatpigs can fly, Bayes’s theorem allows researchers to combine the study’sresults mathematically with hundreds of years of knowledge about thetravel habits of swine.

    David Leonhardt, New York Times, April 28, 2001(#statistics,bayesian)

  • If the prior distribution, at which I am frankly guessing, haslittle or no effect on the result, then why bother; and if it has alarge effect, then since I do not know what I am doing how would I dareact on the conclusions drawn?

    Richard W Hamming, The Art of Probability for Scientists andEngineers, 1991, p.298.(#statistics,bayesian)

  • I believe that there are many classes of problems where Bayesiananalyses are reasonable, mainly classes with which I have littleacquaintance.

    John Tukey, The life and professional contributions of John W.Tukey, The Annals of Statistics, 2001, Vol 30, p.45.(#statistics,bayesian)

  • If you read Bayesian polemics from the 1970s and 1980s, includingmy own, it’s usually arrogant and even insulting. Some of the terms wereexcessively pointed. For example, Bayesians identified which frequentistmethods were “incoherent”, or more accurately, lamented that none seemedto be coherent. On the other hand, Bayesians were accused of being“biased”. The rhetoric was not all that different from that of theFisher/Pearson duels. But we Bayesians have stopped saying derogatorythings, partly because we have changed and partly because frequentistshave been listening. When you’re walking beside someone you tend to becordial; when you’re trying to catch up to tell them something and theyare ignoring what you say, you sometimes yell.

    Don Berry, Celebrating 70: An Interview with Don Berry,Statistical Science, 2012, Vol. 27, No.1, 144-159.(#statistics,bayesian)

  • The traditional methods design of experiments are taught and/ordiscussed in textbooks are not the ways design of experiments are orshould be used for real-world applications.

    George Milliken, Applied Statistics in Agriculture Conference,2009 (#statistics,experimental design)

  • The statistician who supposes that his main contribution to theplanning of an experiment will involve statistical theory, findsrepeatedly that he makes his most valuable contribution simply bypersuading the investigator to explain why he wishes to do theexperiment, by persuading him to justify the experimental treatments,and to explain why it is that the experiment, when completed, willassist him in his research.

    Gertrude Cox, Lecture in Washington 11 January 1951(#statistics,experimental design)

  • An important distinction needs to be made between experimentaldesigns using complete blocks and those using incomplete blocks asregards to three functions: 1. reducing the error mean square 2.adjusting estimates closer to true values, and 3. refining rankings.Complete blocks include all treatments whereas incomplete blocks includea subset of the treatments. Both can reduce the residual error, but onlyincomplete blocks can also adjust estimates of treatment effects closerto the true values and thereby refine rankings among the treatments.These adjusted estimates are usually more accurate than the raw averagesover replicates, but not always (exactly as is the case for accuracygain through more replication). Likewise, these adjusted rankings aremore likely to identify correctly the best treatments. By declaring asmaller error but doing nothing to sharpen estimates or refine rankings,complete block designs are rather impotent. It is ironic that scientistsrarely understand this huge difference between getting one benefit orthree from blocking.

    Hugh G Gauch, Three Strategies for Gaining Accuracy, AmericanScientist. (#statistics,experimentaldesign)

  • On a final note, we would like to stress the importance ofdesign, which often does not receive the attention it deserves.Sometimes, the large number of modeling options for spatial analysis mayraise the false impression that design does not matter, and that asophisticated analysis takes care of everything. Nothing could befurther from the truth.

    Hans-Peter Piepho, Martin P. Boer, Emlyn R. Williams,Two-dimensional P-spline smoothing for spatial analysis of plantbreeding trials, “Biometrical Journal”, Feb 2022.(#statistics,experimental design)

  • At this meeting for the hybrid corn industry, we have noreservation about recommending a design which consists of a singlereplicate of treatments at a given location. For some audiences, such astatement can severely damage the reputation of the person making thestatement. University experiment station personnel in particular regardreplications within an environment as a necessary part of good research.They are not. Sprague (1955) and many others have shown most researchersotherwise.

    R. E. Stucker & D. R. Hicks, Experimental Design and PlotSize Considerations for On-Farm Research, Proceedings of the 46th AnnualCorn and Sorghum Industry Research Conference , 1991, p.60.(#statistics,experimental design)

  • The message from a statistician’s point of view is very clear.Replicate over environments, do not replicate within environments. Thisis not news. At North Carolina State in the early 60s, any graduatestudent interested in quantitative aspects of plant breeding andgenetics had a standard answer for the number of replicates needed in anexperiment: use one replicate if you’re estimating means, and use tworeplicates if you’re estimating variances. Implicit in the answer was,“the experiment will be evaluated in more than one environment”.

    R. E. Stucker & D. R. Hicks, Experimental Design and PlotSize Considerations for On-Farm Research, Proceedings of the 46th AnnualCorn and Sorghum Industry Research Conference , 1991, p.62.(#statistics,experimental design)

  • Which I would like to stress are: (1) A significant effect is notnecessarily the same thing as an interesting effect. (2) Anon-significant effect is not necessarily the same thing as nodifference.

    Christopher Chatfield, Problem solving : a statistician’sguide, p.51.(#statistics,significance)

  • Rejection of a true null hypothesis at the 0.05 level will occuronly one in 20 times. The overwhelming majority of these falserejections will be based on test statistics close to the borderlinevalue. If the null hypothesis is false, the inter-ocular traumatic test[“hit between the eyes”] will often suffice to reject it; calculationwill serve only to verify clear intuition.

    W. Edwards, Harold Lindman, Leonard J. Savage, BayesianStatistical Inference for Psychological Research, University ofMichigan(#statistics,significance,nhst)

  • When statistical inferences, such as p-values, follow extensivelooks at the data, they no longer have their usual interpretation.Ignoring this reality is dishonest: it is like painting a bull’s eyearound the landing spot of your arrow. This is known in some circles asp-hacking, and much has been written about its perils and pitfalls.

    Robert E Kass, Brian S. Caffo, Marie Davidian, Xiao-Li Meng,Bin Yu, Nancy Reid., Ten Simple Rules for Effective StatisticalPractice, PLoS Comput Biol 12(6):e1004961.(#statistics,significance,nhst)

  • The difference between “statistically significant” and “notstatistically significant” is not in itself necessarily statisticallysignificant. By this, I mean more than the obvious point about arbitrarydivisions, that there is essentially no difference between somethingsignificant at the 0.049 level or the 0.051 level. I have a bigger pointto make. It is common in applied research–in the last couple of weeks, Ihave seen this mistake made in a talk by a leading political scientistand a paper by a psychologist–to compare two effects, from two differentanalyses, one of which is statistically significant and one which isnot, and then to try to interpret/explain the difference. Without anyrecognition that the difference itself was not statisticallysignificant.

    Andrew Gelman, The difference between ‘statisticallysignificant’ and ‘not statistically significant’ is not in itselfnecessarily statistically significant, 2005(#statistics,significance)

  • The p-value is a concept so misaligned with intuition that nocivilian can hold it firmly in mind. Nor can many statisticians.

    Matt Briggs, Why do statisticians answer silly questions thatno one ever asks?, Significance, Vol 9, No 1, p.30.(#statistics,significance,nhst)

  • A quotation of a p-value is part of the ritual of science, asprinkling of the holy waters in an effort to sanctify the data analysisand turn consumers of the results into true believers.

    William Cleveland, Visualizing Data, p.177.(#statistics,significance,nhst)

  • We should push for de-emphasizing some topics, such asstatistical significance tests–an unfortunate carry-over from thetraditional elementary statistics course. We would suggest a greaterfocus on confidence intervals—these achieve the aim of formal hypothesistesting, often provide additional useful information, and are not aseasily misinterpreted.

    Gerry Hahn et. al, The Impact of Six Sigma Improvement–AGlimpse Into the Future of Statistics, The American Statistician, August1999. (#statistics,significance,nhst)

  • We statisticians must accept much of the blame for cavalierattitudes toward Type I errors. When we teach practitioners in otherscientific fields that multiplicity is not important, they believe us,and feel free to thrash their data set mercilessly, until it finallyscreams “uncle” and relinquishes significance. The recent conversion ofthe term “data mining” to mean a statistical good rather than astatistical evil also contributes to the problem.

    Peter Westfall, Applied Statistics in Agriculture (Proceedingsof the 13th annual conference), page 5.(#statistics,significance,nhst)

  • While the main emphasis in the development of power analysis hasbeen to provide methods for assessing and increasing power, it shouldalso be noted that it is possible to have too much power. If your sampleis too large, nearly any difference, no matter how small or meaninglessfrom a practical standpoint, will be ‘statistically significant’.

    Clay Helberg(#statistics,significance,power,nhst)

  • Remember that a p-value merely indicates the probability of aparticular set of data being generated by the null model–it has littleto say about the size of a deviation from that model (especially in thetails of the distribution, where large changes in effect size cause onlysmall changes in p-values).

    Clay Helberg(#statistics,significance,nhst)

  • Given what I know about data, models, and assumptions, I findmore than 2 significant digits of printout for a p-value to beindefensible. (I actually think 1 digit is about the max).

    Terry Therneau, S-news mailing list, 8 Nov 2000(#statistics,significance)

  • In the calculus of real statistical inference, and by that I meanactual data problems (which S was designed for), all p-values < 10^-6or so are identical. This is one of the few areas in fact where I likeSAS better: the creators of their PROCs are smart enough to print thesenumbers as zero and leave it at that. There are no Gaussiandistributions in the real world, and the central limit theorem hasfailed long, long before 10^-17.

    Terry Therneau, S-news mailing list, 4 Apr 2002(#statistics,significance,computing)

  • It’s a commonplace among statisticians that a chi-squared test(and, really, any p-value) can be viewed as a crude measure of samplesize: When sample size is small, it’s very difficult to get a rejection(that is, a p-value below 0.05), whereas when sample size is huge, justabout anything will bag you a rejection. With large n, a smaller signalcan be found amid the noise. In general: small n, unlikely to get smallp-values. Large n, likely to find something. Huge n, almost certain tofind lots of small p-values.

    Andrew Gelman, The sample size is huge, so a p-value of 0.007is not that impressive, 2009.(#statistics,significance,nhst)

  • Work by Bickel, Ritov, and Stoker (2001) shows thatgoodness-of-fit tests have very little power unless the direction of thealternative is precisely specified. The implication is that omnibusgoodness-of-fit tests, which test in many directions simultaneously,have little power, and will not reject until the lack of fit isextreme.

    Leo Breiman, Statistical Modeling: The Two Cultures,Statistical Science, Vol 16, p.203.(#statistics,significance,nhst)

  • Visualizations act as a campfire around which we gather to tellstories.

    Al Shalloway, 2011 (#eda,datavisualization)

  • If students have students have no experience with hands-on[telescope] observing, they may take all data as ‘truth’ without havingan understanding of how the data are obtained and what could potentiallygo wrong in that process, so I think it becomes crucially important togive a glimpse of what’s happening behind the scenes at telescopes, sothey can be appropriately skeptical users of data in the future.

    Colette Salyk, Sky & Telescope, Apr 2022 p.31.(#data)

  • Normality is a myth; there never was, and never will be, a normaldistribution. This is an over-statement from the practical point ofview, but it represents a safer initial mental attitude than any infashion during the past two decades.

    R. C. Geary, Testing for normality, 1947. Biometrika 34 :209-242. (#nhst,normality)

  • Furthermore, the mere declaration that the interaction is or isnot significant is far too coarse a result to give agronomists or plantbreeders effective insight into their research material.

    Hugh G. Gauch Jr., Model selection and validation for yieldtrials with interaction, 1988. Biometrics 44 : 705-715.

  • Analysis of variance … stems from a hypothesis-testingformulation that is difficult to take seriously and would be of limitedvalue for making final conclusions.

    Herman Chernoff, Comment, 1986. The American Statistician 40(1): 5-6. (#nhst)

  • The peculiarity of … statistical hypotheses is that they are notconclusively refutable by any experience.

    Richard B. Braithwaite, Scientific Explanation. A Study of theFunction of Theory, Probability and Law in Science (p.151), 1953.Cambridge University Press. (#nhst)

  • …“no batch of observations, however large, either definitivelyrejects or definitively fails to reject the hypothesis H0.

    Richard B. Braithwaite, Scientific Explanation. A Study of theFunction of Theory, Probability and Law in Science (p.160), 1953.Cambridge University Press. (#nhst)

  • what John Dewey called ‘the quest for certainty’ is, in the caseof empirical knowledge, a snare and a delusion.

    Richard B. Braithwaite, Scientific Explanation. A Study of theFunction of Theory, Probability and Law in Science (p.163), 1953.Cambridge University Press.(#knowledge,uncertainty)

  • The ultimate justification for any scientific belief will dependupon the main purpose for which we think scientifically–that ofpredicting and thereby controlling the future.

    Richard B. Braithwaite, Scientific Explanation. A Study of theFunction of Theory, Probability and Law in Science (p.174), 1953.Cambridge University Press. (#science)

  • Most readers of The American Statistician will recognize thelimited value of hypothesis testing in the science of statistics. I amnot sure that they all realize the extent to which it has become theprimary tool in the religion of Statistics.

    David Salsburg, The Religion of Statistics as Practiced inMedical Journals, 1985. The American Statistician, 39, 220-223.(#nhst)

  • We are better off abandoning the use of hypothesis tests entirelyand concentrating on developing continuous measures of toxicity whichcan be used for estimation.

    David Salsburg, Statistics for Toxicologists, 1986. New York,Marcel Dekker, Inc. (#nhst)

  • I do not think that significance testing should be completelyabandoned … and I don’t expect that it will be. But I urge researchersto provide estimates, with confidence intervals: scientific advancerequires parameters with known reliability estimates. Classicalconfidence intervals are formally equivalent to a significance test, butthey convey more information.

    Nigel G. Yoccoz, Use, Overuse, and Misuse of Significance Testsin Evolutionary Biology and Ecology. Bulletin of the Ecological Societyof America, Vol. 72, No.2 (Jun., 1991), pp.106-111.(#nhst,significance,uncertainty)

  • In marked contrast to what is advocated by most statisticians,most evolutionary biologists and ecologists overemphasize the potentialrole of significance testing in their scientific practice. Biologicalsignificance should be emphasized rather than statistical significance.Furthermore, a survey of papers showed that the literature isinfiltrated by an array of misconceptions about the use andinterpretation of significance tests. … By far the most common error isto confound statistical significance with biological, scientificsignificance… Statements like ‘the two populations are significantlydifferent relative to parameter X (P=.004)’ are found with no mention ofthe estimated difference. … Most biologists and other users ofstatistical methods still seem to be unaware that significance testingby itself sheds little light on the questions they are posing.

    Nigel G. Yoccoz, Use, Overuse, and Misuse of Significance Testsin Evolutionary Biology and Ecology. Bulletin of the Ecological Societyof America, Vol. 72, No.2 (Jun., 1991), pp.106-111.(#nhst,significance)

  • Tests appear to many users to be a simple way to discharge theobligation to provide some statistical treatment of the data.

    H. V. Roberts, For what use are tests of hypotheses and testsof significance, 1976. Communications in Statistics, Series A,5:753-761. (#nhst)

  • We shall marshal arguments against [significance] testing,leading to the conclusion that it be abandoned by all substantivescience and not just by educational research and other social scienceswhich have begun to raise voices against the virtual tyranny of thisbranch of inference in the academic world.

    Louis Guttman, The illogic of statistical inference forcumulative science, 1985. Applied Stochastic Models and Data Analysis1:3-9. (#nhst)

  • In practice, of course, tests of significance are not takenseriously.

    Louis Guttman, The illogic of statistical inference forcumulative science, 1985. Applied Stochastic Models and Data Analysis1:3-9. (#nhst)

  • Since a point hypothesis is not to be expected in practice to beexactly true, but only approximate, a proper test of significance shouldalmost always show significance for large enough samples. So the wholegame of testing point hypotheses, power analysis notwithstanding, is buta mathematical game without empirical importance.

    Louis Guttman, The illogic of statistical inference forcumulative science, 1985. Applied Stochastic Models and Data Analysis1:3-9. (#nhst)

  • …lack of interaction in analysis of variance and … lack ofcorrelation in bivariate distributions–such nullities would be quitesurprising phenomena in the usual interactive complexities of sociallife.

    Louis Guttman, What is not what in statistics, 1977. TheStatistician, 26:81-107.(#nhst,correlation)

  • Estimation and approximation may be more fruitful thansignificance in developing science, never forgetting replication.

    Louis Guttman, What is not what in statistics, 1977. TheStatistician, 26:81-107. (#nhst)

  • [the normal distribution] is seldom, if ever, observed innature.

    Louis Guttman, What is not what in statistics, 1977. TheStatistician, 26:81-107. (#normality)

  • The test of statistical significance in psychological researchmay be taken as an instance of a kind of essential mindlessness in theconduct of research.

    D. Bakan, The test of significance in psychological research,1966. Psychological Bulletin 66: 423-437.(#nhst)

  • …the test of significance has been carrying too much of theburden of scientific inference. It may well be the case that wise andingenious investigators can find their way to reasonable conclusionsfrom data because and in spite of their procedures. Too often, however,even wise and ingenious investigators…tend to credit the test ofsignificance with properties it does not have.

    D. Bakan, The test of significance in psychological research,1966. Psychological Bulletin 66: 423-437.(#nhst)

  • …a priori reasons for believing that the null hypothesis isgenerally false anyway. One of the common experiences of researchworkers is the very high frequency with which significant results areobtained with large samples.

    D. Bakan, The test of significance in psychological research,1966. Psychological Bulletin 66: 423-437.(#nhst)

  • …there is really no good reason to expect the null hypothesis tobe true in any population … Why should any correlation coefficient beexactly .00 in the population? … why should different drugs have exactlythe same effect on any population parameter?

    D. Bakan, The test of significance in psychological research,1966. Psychological Bulletin 66: 423-437.(#nhst)

  • …we need to get on with the business of generating … hypothesesand proceed to do investigations and make inferences which bear on them,instead of … testing the statistical null hypothesis in any number ofcontexts in which we have every reason to suppose that it is false inthe first place.

    D. Bakan, The test of significance in psychological research,1966. Psychological Bulletin 66: 423-437.(#nhst)

  • the tests of null hypotheses of zero differences, of norelationships, are frequently weak, perhaps trivial statements of theresearcher’s aims … in many cases, instead of the tests of significanceit would be more to the point to measure the magnitudes of therelationships, attaching proper statements of their sampling variation.The magnitudes of relationships cannot be measured in terms of levels ofsignificance.

    Leslie Kish, Some statistical problems in research design,1959. American Sociological Review 24: 328-338.(#nhst)

  • There are instances of research results presented in terms ofprobability values of ‘statistical significance’ alone, without notingthe magnitude and importance of the relationships found. These attemptsto use the probability levels of significance tests as measures of thestrengths of relationships are very common and very mistaken.

    Leslie Kish, Some statistical problems in research design,1959. American Sociological Review 24: 328-338.(#nhst)

  • One reason for preferring to present a confidence intervalstatement (where possible) is that the confidence interval, by itswidth, tells more about the reliance that can be placed on the resultsof the experiment than does a YES-NO test of significance.

    Mary G. Natrella, The relation between confidence intervals andtests of significance, 1960. American Statistician 14 : 20-22, 33.(#nhst)

  • Confidence intervals give a feeling of the uncertainty ofexperimental evidence, and (very important) give it in the same units …as the original observations.

    Mary G. Natrella, The relation between confidence intervals andtests of significance, 1960. American Statistician 14 : 20-22, 33.(#nhst)

  • The current obsession with .05 … has the consequence ofdifferentiating significant research findings and those best forgotten,published studies from unpublished ones, and renewal of grants fromtermination. It would not be difficult to document the joy experiencedby a social scientist when his F ratio or t value yields significance at.05, nor his horror when the table reads ‘only’ .10 or .06. One comes tointernalize the difference between .05 and .06 as ‘right’ vs.‘wrong,’‘creditable’ vs.‘embarrassing,’ ‘success’ vs.‘failure’.

    James K. Skipper Jr., Anthony L. Guenther and Gilbert Nass, Thesacredness of .05: A note concerning the uses of statistical levels ofsignificance in social science. The American Sociologist 2 : 16-18.(#nhst)

  • …blind adherence to the .05 level denies any consideration ofalternative strategies, and it is a serious impediment to theinterpretation of data”

    James K. Skipper Jr., Anthony L. Guenther and Gilbert Nass, Thesacredness of .05: A note concerning the uses of statistical levels ofsignificance in social science. The American Sociologist 2 : 16-18.(#nhst)

  • … surely, God loves the .06 nearly as much as the .05.

    R. L. Rosnow and R. Rosenthal, Statistical procedures and thejustification of knowledge and psychological science, 1989. AmericanPsychologist 44: 1276-1284. (#nhst)

  • How has the virtually barren technique of hypothesis testing cometo assume such importance in the process by which we arrive at ourconclusions from our data?

    G. R. Loftus, On the tyranny of hypothesis testing in thesocial sciences, 1991. Contemporary Psychology 36: 102-105.(#nhst)

  • Despite the stranglehold that hypothesis testing has onexperimental psychology, I find it difficult to imagine a lessinsightful means of transitting from data to conclusions.

    G. R. Loftus, On the tyranny of hypothesis testing in thesocial sciences. Contemporary Psychology 36: 102-105.(#nhst)

  • Whereas hypothesis testing emphasizes a very narrow question (‘Dothe population means fail to conform to a specific pattern?’), the useof confidence intervals emphasizes a much broader question (‘What arethe population means?’). Knowing what the means are, of course, impliesknowing whether they fail to conform to a specific pattern, although thereverse is not true. In this sense, use of confidence intervals subsumesthe process of hypothesis testing.

    G. R. Loftus, On the tyranny of hypothesis testing in thesocial sciences. Contemporary Psychology 36: 102-105.(#nhst)

  • This remarkable state of affairs [overuse of significancetesting] is analogous to engineers’ teaching (and believing) that lightconsists only of waves while ignoring its particle characteristics—andlosing in the process, of course, any motivation to pursue the mostinteresting puzzles and paradoxes in the field.

    G. R. Loftus, On the tyranny of hypothesis testing in thesocial sciences, 1991. Contemporary Psychology 36: 102-105.(#nhst)

  • The result is that non-statisticians tend to place undue relianceon single ‘cookbook’ techniques, and it has for example becomeimpossible to get results published in some medical, psychological andbiological journals without reporting significance values even if ofdoubtful validity. It is sad that students may actually be more confusedand less numerate at the end of a ‘service course’ than they were at thebeginning, and more likely to overlook a descriptive approach in favourof some inferential method which may be inappropriate or incorrectlyexecuted.

    C. Chatfield, The initial examination of data, 1985. Journal ofthe Royal Statistical Society, Series A 148: 214-253.(#nhst)

  • ‘Common sense’ is not common but needs to learnt systematically…A ‘simple analysis’ can be harder than it looks…. All statisticaltechniques, however sophisticated, should be subordinate to subjectivejudgement.

    C. Chatfield, The initial examination of data, 1985. Journal ofthe Royal Statistical Society, Series A 148: 214-253.(#nhst)

  • Thus statistics should generally be taught more as a practicalsubject with analyses of real data. Of course some theory and anappropriate range of statistical tools need to be learnt, but studentsshould be taught that Statistics is much more than a collection ofstandard prescriptions.

    C. Chatfield, The initial examination of data, 1985. Journal ofthe Royal Statistical Society, Series A 148: 214-253.(#data)

  • More fundamentally students should be taught that instead ofasking ‘What techniques shall I use here?,’ they should ask ‘How can Isummarize and understand the main features of this set of data?’

    C. Chatfield, The initial examination of data, 1985. Journal ofthe Royal Statistical Society, Series A 148: 214-253.(#data)

  • All statistical techniques, however sophisticated, should besubordinate to subjective judgement.

    C. Chatfield, The initial examination of data, 1985. Journal ofthe Royal Statistical Society, Series A 148: 214-253.

  • …it has … become impossible to get results published in somemedical, psychological and biological journals without reportingsignificance values even when of doubtful validity.

    C. Chatfield, The initial examination of data, 1985. Journal ofthe Royal Statistical Society, Series A 148: 214-253.(#nhst)

  • …to make measurements and then ignore their magnitude wouldordinarily be pointless. Exclusive reliance on tests of significanceobscures the fact that statistical significance does not implysubstantive significance.

    I. R. Savage, Nonparametric Statistics. Journal of the AmericanStatistical Association, 52, 331-344.(#nhst)

  • Null hypotheses of no difference are usually known to be falsebefore the data are collected … when they are, their rejection oracceptance simply reflects the size of the sample and the power of thetest, and is not a contribution to science”

    I. R. Savage, Nonparametric Statistics. Journal of the AmericanStatistical Association, 52, 331-344.(#nhst)

  • too many users of the analysis of variance seem to regard thereaching of a mediocre level of significance as more important than anydescriptive specification of the underlying averages Our thesis is thatpeople have strong intuitions about random sampling; that theseintuitions are wrong in fundamental respects; that these intuitions areshared by naive subjects and by trained scientists; and that they areapplied with unfortunate consequences in the course of scientificinquiry. We submit that people view a sample randomly drawn from apopulation as highly representative, that is, similar to the populationin all essential characteristics. Consequently, they expect any twosamples drawn from a particular population to be more similar to oneanother and to the population than sampling theory predicts, at leastfor small samples.

    Amos Tversky & Daniel Kahneman, Belief in the law of smallnumbers. Psychological Bulletin, 76(2), 105-110.(#sampling)

  • People have erroneous intuitions about the laws of chance. Inparticular, they regard a sample randomly drawn from a population ashighly representative, that is, similar to the population in allessential characteristics. The prevalence of the belief and itsunfortunate consequences for psychological research are illustrated bythe responses of professional psychologists to a questionnaireconcerning research decisions

    Amos Tversky & Daniel Kahneman, Belief in the law of smallnumbers. Psychological Bulletin, 76(2), 105-110.(#sampling)

  • the statistical power of many psychological studies isridiculously low. This is a self-defeating practice: it makes forfrustrated scientists and inefficient research. The investigator whotests a valid hypothesis but fails to obtain significant results cannothelp but regard nature as untrustworthy or even hostile.

    Amos Tversky & Daniel Kahneman, Belief in the law of smallnumbers. Psychological Bulletin, 76(2), 105-110.(#nhst,power)

  • Significance levels are usually computed and reported, but powerand confidence limits are not. Perhaps they should be.

    Amos Tversky & Daniel Kahneman, Belief in the law of smallnumbers. Psychological Bulletin, 76(2), 105-110.(#nhst)

  • The emphasis on significance levels tends to obscure afundamental distinction between the size of an effect and itsstatistical significance.

    Amos Tversky & Daniel Kahneman, Belief in the law of smallnumbers. Psychological Bulletin, 76(2), 105-110.(#nhst)

  • Statistical hypothesis testing is commonly used inappropriatelyto analyze data, determine causality, and make decisions aboutsignificance in ecological risk assessment,… It discourages goodtoxicity testing and field studies, it provides less protection toecosystems or their components that are difficult to sample orreplicate, and it provides less protection when more treatments orresponses are used. It provides a poor basis for decision-making becauseit does not generate a conclusion of no effect, it does not indicate thenature or magnitude of effects, it does address effects at untestedexposure levels, and it confounds effects and uncertainty…. Riskassessors should focus on analyzing the relationship between exposureand effects….

    Glenn W. Suter, Abuse of hypothesis testing statistics inecological risk assessment, 1996. Human and Ecological Risk Assessment2: 331-347. (#nhst)

  • I argued that hypothesis testing is fundamentally inappropriatefor ecological risk assessment, that its use has undesirableconsequences for environmental protection, and that preferablealternatives exist for statistical analysis of data in ecological riskassessment. The conclusion of this paper is that ecological riskassessors should estimate risks rather than test hypothesis

    Glenn W. Suter, Abuse of hypothesis testing statistics inecological risk assessment, 1996. Human and Ecological Risk Assessment2: 331-347. (#nhst)

  • The purpose of an experiment is to answer questions. The truth ofthis seems so obvious, that it would not be worth emphasizing were itnot for the fact that the results of many experiments are interpretedand presented with little or no reference to the questions that wereasked in the first place.

    T. M. Little, Interpretation and presentation of results, 1981.Hortscience 16: 637-640.

  • The idea that one should proceed no further with an analysis,once a non-significant F-value for treatments is found, has led manyexperimenters to overlook important information in the interpretation oftheir data.

    T. M. Little, Interpretation and presentation of results, 1981.Hortscience 16: 637-640. (#nhst,anova)

  • the null-hypothesis models … share a crippling flaw: in the realworld the null hypothesis is almost never true, and it is usuallynonsensical to perform an experiment with the sole aim of rejecting thenull hypothesis.

    Jum Nunnally, The place of statistics in psychology, 1960.Educational and Psychological Measurement 20 : 641-650.(#nhst)

  • If rejection of the null hypothesis were the real intention inpsychological experiments, there usually would be no need to gatherdata.

    Jum Nunnally, The place of statistics in psychology, 1960.Educational and Psychological Measurement 20 : 641-650.(#nhst)

  • Closely related to the null hypothesis is the notion that onlyenough subjects need be used in psychological experiments to obtain‘significant’ results. This often encourages experimenters to be contentwith very imprecise estimates of effects.

    Jum Nunnally, The place of statistics in psychology, 1960.Educational and Psychological Measurement 20 : 641-650.(#nhst)

  • We should not feel proud when we see the psychologist smile andsay ‘the correlation is significant beyond the .01 level.’ Perhaps thatis the most that he can say, but he has no reason to smile.

    Jum Nunnally, The place of statistics in psychology, 1960.Educational and Psychological Measurement 20 : 641-650.(#nhst)

  • the finding of statistical significance is perhaps the leastimportant attribute of a good experiment.

    D. T. Lykken, Statistical significance in psychologicalresearch, 1968. Psychological Bulletin 70 : 151-159.(#nhst)

  • Editors must be bold enough to take responsibility for decidingwhich studies are good and which are not, without resorting to lettingthe p value of the significance tests determine this decision.

    D. T. Lykken, Statistical significance in psychologicalresearch, 1968. Psychological Bulletin 70 : 151-159.(#nhst)

  • Statistical significance testing has involved more fantasy thanfact. The emphasis on statistical significance over scientificsignificance in educational research represents a corrupt form of thescientific method. Educational research would be better off if itstopped testing its results for statistical significance.

    R. P. Carver, The case against statistical testing. HarvardEducational Review 48: 378-399.(#nhst)

  • Statistical significance ordinarily depends upon how manysubjects are used in the research. the more subjects the researcheruses, the more likely the researcher will be to get statisticallysignificant results.

    R. P. Carver, The case against statistical testing. HarvardEducational Review 48: 378-399.(#nhst)

  • What is the probability of obtaining a dead person (D) given thatthe person was hanged (H); that is, in symbol form, what is p(D|H)?Obviously, it will be very high, perhaps .97 or higher. Now, let usreverse the question: What is the probability that a person has beenhanged (H) given that the person is dead (D); that is, what is p(H|D)?This time the probability will undoubtedly be very low, perhaps .01 orlower. No one would be likely to make the mistake of substituting thefirst estimate (.97) for the second (.01); that is, to accept .97 as theprobability that a person has been hanged given that the person is dead.Even thought this seems to be an unlikely mistake, it is exactly thekind of mistake that is made with the interpretation of statisticalsignificance testing—by analogy, calculated estimates of p(H|D) areinterpreted as if they were estimates of p(D|H), when they are clearlynot the same.

    R. P. Carver, The case against statistical testing. HarvardEducational Review 48: 378-399.(#nhst,probability)

  • The author recommends abandoning all statistical significancetesting and suggests other ways of evaluating research results.” …“Another reason for the popularity of statistical significance testingis probably the complicated mathematical procedures lend an error ofscientific objectivity to conclusions.” … “Given that statisticalsignificance testing usually involves a corrupt form of the scientificmethod and, at best, is of trivial scientific importance, journaleditors should not require it as a necessary part of a publishableresearch article.

    R. P. Carver, The case against statistical testing. HarvardEducational Review 48: 378-399.(#nhst)

  • Pencil and paper for construction of distributions, scatterdiagrams, and run-charts to compare small groups and to detect trends,are more efficient methods of estimation than statistical inference thatdepends on variances and standard errors, as the simple techniquespreserve the information in the original data.

    W. Edwards Deming, On probability as a basis for action, 1975.American Statistician 29: 146-152.(#eda)

  • We admit with Sir Winston Churchill that it sometimes pays toadmit the obvious: we do not perform an experiment to find out if twovarieties of wheat or two drugs are equal. We know in advance withoutspending a dollar on an experiment that they are not equal. Thedifference between two treatments or between two areas or two groups ofpeople, will show up as ‘significantly different’ if the experiment beconducted through a sufficient number of trials, even thought thedifference be so small that it is of no scientific or economicconsequence. Likewise tests of whether the data of a survey or anexperiment fit some particular curve is of no scientific or economicconsequence…. With enough data no curve will fit the results of anexperiment. The question that one faces in using any curve or anyrelationship is this: how robust are the conclusions? Would some othercurve make safer predictions? Statistical significance of B/A thusconveys no knowledge, no basis for action.

    W. Edwards Deming, On probability as a basis for action, 1975.American Statistician 29: 146-152.(#significance)

  • Under the usual teaching, the trusting student, to pass thecourse must forsake all the scientific sense that he has accumulated sofar, and learn the book, mistakes and all.

    W. Edwards Deming, On probability as a basis for action, 1975.American Statistician 29: 146-152.(#science)

  • While [Edward C. Bryant] was at the University of Wyoming,someone came in from the Department of Animal Husbandry to announce tohim an astounding scientific discovery—the fibres on the left side ofthe sheep and those on the right side are of different diameter.Dr.Bryant asked him how many fibres he had in the sample: answer,50,000. This was a number big enough to establish significance. But whatof it? Anyone would know in advance, without spending a dollar, thatthere is a difference between fibres of the left side and the right sideof any sheep, or of n sheep combined. The question is whether thedifference is of scientific importance.

    W. Edwards Deming, On probability as a basis for action, 1975.American Statistician 29: 146-152.(#nhst,science)

  • Small wonder that students have trouble [with statisticalhypothesis testing]. They may be trying to think.

    W. Edwards Deming, On probability as a basis for action, 1975.American Statistician 29: 146-152.(#nhst,significance)

  • Data analysis methods in psychology still emphasize statisticalsignificance testing, despite numerous articles demonstrating its severedeficiencies. It is now possible to use meta-analysis to show thatreliance on significance testing retards the development of cumulativeknowledge. The reform of teaching and practice will also require thatresearchers learn that the benefits that they believe flow from use ofsignificance testing are illusory. Teachers must re-vamp their coursesto bring students to understand that a) reliance on significance testingretards the growth of cumulative research knowledge; b) benefits widelybelieved to flow from significance testing do not in fact exist; c)significance testing methods must be replaced with point estimates andconfidence intervals in individual studies and with meta-analyses andthe integration of multiple studies. This reform is essential to thefuture progress of cumulative knowledge and psychological research.

    Frank L. Schmidt, Statistical significance testing andcumulative knowledge in psychology: implications for training ofresearchers. Psychological Methods 1(2), Jun 1996, 115-129.(#nhst,significance,knowledge)

  • If the null hypothesis is not rejected, Fisher’s position wasthat nothing could be concluded. But researchers find it hard to go toall the trouble of conducting a study only to conclude that nothing canbe concluded.

    Frank L. Schmidt, Statistical significance testing andcumulative knowledge in psychology: implications for training ofresearchers. Psychological Methods 1(2), Jun 1996, 115-129.(#nhst,significance)

  • Many researchers believe that statistical significance testingconfers important benefits that are in fact completely imaginary.

    Frank L. Schmidt, Statistical significance testing andcumulative knowledge in psychology: implications for training ofresearchers. Psychological Methods 1(2), Jun 1996, 115-129.(#nhst,significance)

  • An important part of the explanation [of continued use ofsignificance testing] is that researchers hold false beliefs aboutsignificance testing, beliefs that tell them that significance testingoffers important benefits to researchers that it in fact does not. Threeof these beliefs are particularly important. The first is the falsebelief that the significance level of a study indicates the probabilityof successful replications of the study…. A second false belief widelyheld by researchers is that statistical significance level provides anindex of the importance or size of a difference or relation…. The thirdfalse belief held by many researchers is the most devastating of all tothe research enterprise. This is the belief that if a difference orrelation is not statistically significant, then it is zero, or at leastso small that it can safely be considered to be zero. This is the beliefthat if the null hypothesis is not rejected then it is to be accepted.This is the belief that a major benefit from significance tests is thatthey tell us whether a difference or affect is real or ‘probably justoccurred by chance’.

    Frank L. Schmidt, Statistical significance testing andcumulative knowledge in psychology: Implications for training ofresearchers. Psychological Methods 1(2), Jun 1996, 115-129.(#nhst,significance)

  • We can no longer tolerate a situation in which our upcominggeneration of researchers are being trained to use discredited dataanalysis methods while the broader research enterprise of which they areto become a part has moved toward improved methods.

    Frank L. Schmidt, Statistical significance testing andcumulative knowledge in psychology: implications for training ofresearchers. Psychological Methods 1(2), Jun 1996, 115-129.(#nhst)

  • I believe … that hypothesis testing has been greatlyoveremphasized in psychology and in the other disciplines that use it.It has diverted our attention from crucial issues. Mesmerized by asingle all-purpose, mechanized, ‘objective’ ritual in which we convertnumbers into other numbers and get a yes-no answer, we have come toneglect close scrutiny of where the numbers come from.

    Jacob Cohen, Things I have learned (so far), 1990. AmericanPsychologist 45: 1304-1312.(#nhst,significance)

  • … the primary product of a research inquiry is one or moremeasures of effect size, not p values.

    Jacob Cohen, Things I have learned (so far), 1990. AmericanPsychologist 45: 1304-1312.(#nhst,significance)

  • The prevailing yes-no decision at the magic .05 level from asingle research is a far cry from the use of informed judgment. Sciencesimply doesn’t work that way. A successful piece of research doesn’tconclusively settle an issue, it just makes some theoretical propositionto some degree more [or less] likely.

    Jacob Cohen, Things I have learned (so far), 1990. AmericanPsychologist 45: 1304-1312. (#nhst)

  • One of the things I learned early on was that some things youlearn aren’t so.

    Jacob Cohen, Things I have learned (so far), 1990. AmericanPsychologist 45: 1304-1312.

  • When a Fisherian null hypothesis is rejected with an associatedprobability of, for example, .026, it is not the case that theprobability that the null hypothesis is true is .026 (or less than .05,or any other value we can specify). Given our framework of probabilityas long-run relative frequency–as much as we might wish it to beotherwise–this result does not tell us about the truth of the nullhypothesis, given the data. (For this we have to go to Bayesian orlikelihood statistics, in which probability is not relative frequencybut degree of belief.

    Jacob Cohen, Things I have learned (so far), 1990. AmericanPsychologist 45: 1304-1312.(#nhst,significance)

  • Despite widespread misconceptions to the contrary, the rejectionof a given null hypothesis gives us no basis for estimating theprobability that a replication of the research will again result inrejecting that null hypothesis.

    Jacob Cohen, Things I have learned (so far), 1990. AmericanPsychologist 45: 1304-1312.(#nhst,significance)

  • Of course, everyone knows that failure to reject the Fisheriannull hypothesis does not warrant the conclusion that it is true. Fishercertainly knew and emphasized it, and our textbooks duly so instruct us.Yet how often do we read in the discussion and conclusions of articlesnow appearing in our most prestigious journals that ‘there is nodifference’ or ‘no relationship’.

    Jacob Cohen, Things I have learned (so far), 1990. AmericanPsychologist 45: 1304-1312. (#nhst)

  • A little thought reveals a fact widely understood amongstatisticians: The null hypothesis, taken literally (and that’s the onlyway you can take it in formal hypothesis testing), is always false inthe real world…. If it is false, even to a tiny degree, it must be thecase that a large enough sample will produce a significant result andlead to its rejection. So if the null hypothesis is always false, what’sthe big deal about rejecting it.

    Jacob Cohen, Things I have learned (so far), 1990. AmericanPsychologist 45: 1304-1312. (#nhst)

  • I am, however, appalled by the fact that some publishers ofstatistics packages successfully hawk their wares with the pitch that itisn’t necessary to understand statistics to use them.

    Jacob Cohen, Things I have learned (so far), 1990. AmericanPsychologist 45: 1304-1312.(#computing)

  • I argue herein that NHST [null hypothesis significance testing]has not only failed to support the advance of psychology as a sciencebut also has seriously impeded it.

    Jacob Cohen, The earth is round (p<.05). 1994. AmericanPsychologist 49: 997-1003. (#nhst)

  • they [confidence limits] are rarely to be found in theliterature. I suspect that the main reason they are not reported is thatthey are so embarrassingly large!

    Jacob Cohen, The earth is round (p<.05). 1994. AmericanPsychologist 49: 997-1003. (#nhst)

  • After four decades of severe criticism, the ritual of nullhypothesis significance testing—mechanical dichotomous decisions arounda sacred .05 criterion—still persist. This article reviews the problemswith this practice…” … “What’s wrong with [null hypothesis significancetesting]? Well, among many other things, it does not tell us what wewant to know, and we so much want to know what we want to know that, outof desperation, we nevertheless believe that it does!

    Jacob Cohen, The earth is round (p<.05). 1994. AmericanPsychologist 49: 997-1003. (#nhst)

  • Tests of the null hypothesis that there is no difference betweencertain treatments are often made in the analysis of agricultural orindustrial experiments in which alternative methods or processes arecompared. Such tests are … totally irrelevant. What are needed areestimates of magnitudes of effects, with standard errors.

    F. J. Anscombe, Discussion on Dr.David’s and Dr.Johnson’sPaper. 1956. Journal of the Royal Statistical Society B 18 : 24-27.(#nhst)

  • statistical significance is not the same as scientificsignificance.

    Norman S. Matloff, Statistical hypothesis testing: problems andalternatives. 1991. Environmental Entomology 20 : 1246-1250.(#nhst)

  • the number of stars by itself is relevant only to the question ofwhether H0 is exactly true–a question which is almost always not ofinterest to us, especially because we usually know a priori that H0cannot be exactly true.

    Norman S. Matloff, Statistical hypothesis testing: problems andalternatives. 1991. Environmental Entomology 20 : 1246-1250.(#nhst,anova)

  • no population has an exact normal distribution, nor are variancesexactly hom*ogeneous, and independence assumptions are often violated toat least some degree.

    Norman S. Matloff, Statistical hypothesis testing: problems andalternatives. 1991. Environmental Entomology 20 : 1246-1250.(#normality)

  • Exact truth of a null hypothesis is very unlikely except in agenuine uniformity trial.

    David R. Cox, Some problems connected with statisticalinference. 1958. Annals of Mathematical Statistics 29 : 357-372.(#nhst)

  • Assumptions that we make, such as those concerning the form ofthe population sampled, are always untrue.

    David R. Cox, Some problems connected with statisticalinference. 1958. Annals of Mathematical Statistics 29 : 357-372.(#sampling)

  • Overemphasis on tests of significance at the expense especiallyof interval estimation has long been condemned.

    David R. Cox, The role of significance tests. 1977.Scandanavian Journal of Statistics 4: 49-70.(#nhst)

  • …There are considerable dangers in overemphasizing the role ofsignificance tests in the interpretation of data.

    David R. Cox, The role of significance tests. 1977.Scandanavian Journal of Statistics 4: 49-70.(#nhst)

  • In any particular application, graphical or other informalanalysis may show that consistency or inconsistency with H0 is so clearcut that explicit calculation of p is unnecessary.

    David R. Cox, The role of significance tests. 1977.Scandanavian Journal of Statistics 4: 49-70.(#nhst)

  • The central point is that statistical significance is quitedifferent from scientific significance and that therefore estimation …ofthe magnitude of effects is in general essential regardless of whetherstatistically significant departure from the null hypothesis isachieved.

    David R. Cox, The role of significance tests. 1977.Scandanavian Journal of Statistics 4: 49-70.(#nhst)

  • It is very bad practice to summarise an important investigationsolely by a value of P.

    David R. Cox, Statistical significance tests. 1982. BritishJournal of Clinical Pharmacology 14 : 325-331.(#nhst)

  • The criterion for publication should be the achievement ofreasonable precision and not whether a significant effect has beenfound.

    David R. Cox, Statistical significance tests. 1982. BritishJournal of Clinical Pharmacology 14 : 325-331.(#nhst)

  • The continued very extensive use of significance tests isalarming.

    David R. Cox, Some general aspects of the theory of statistics.1986. International Statistical Review 54: 117-126.(#nhst)

  • It has been widely felt, probably for thirty years and more, thatsignificance tests are overemphasized and often misused and that moreemphasis should be put on estimation and prediction. While such a shiftof emphasis does seem to be occurring, for example in medicalstatistics, the continued very extensive use of significance tests is onthe one hand alarming and on the other evidence that they are aimed,even if imperfectly, at some widely felt need.

    David R. Cox, Some general aspects of the theory of statistics.1986. International Statistical Review 54: 117-126.(#nhst)

  • the emphasis given to formal tests of significance … has resultedin … an undue concentration of effort by mathematical statisticians oninvestigations of tests of significance applicable to problems which areof little or no practical importance … and … it has caused scientificresearch workers to pay undue attention to the results of the tests ofsignificance … and too little to the estimates of the magnitude of theeffects they are investigating.

    Frank Yates, The influence of Statistical Methods for ResearchWorkers on the development of the science of statistics. 1951. Journalof the American Statistical Association 46: 19-34.(#nhst)

  • …the unfortunate consequence that scientific workers have oftenregarded the execution of a test of significance on an experiment as theultimate objective.

    Frank Yates, The influence of Statistical Methods for ResearchWorkers on the development of the science of statistics. 1951. Journalof the American Statistical Association 46: 19-34.(#nhst)

  • [Researchers] pay undue attention to the results of tests ofsignificance they perform on their data, particularly data derived fromexperiments, and too little to the estimates of the magnitude of theeffects which they are investigating…. The emphasis on tests ofsignificance, and the consideration of the results of each experiment inisolation, have had the unfortunate consequence that scientific workershave often regarded the execution of a test of significance on anexperiment as the ultimate objective. Results are significant or not andthat is the end to it.

    Frank Yates, The influence of Statistical Methods for ResearchWorkers on the development of the science of statistics. 1951. Journalof the American Statistical Association 46: 19-34.(#nhst)

  • The most commonly occurring weakness … is … undue emphasis ontests of significance, and failure to recognise that in many types ofexperimental work estimates of treatment effects, together withestimates of the errors to which they are subject, are the quantities ofprimary interest.

    Frank Yates, Sir Ronald Fisher and the design of experiments.1964. Biometrics 20: 307-321. (#nhst)

  • In many experiments … it is known that the null hypothesis … iscertainly untrue.

    Frank Yates, Sir Ronald Fisher and the design of experiments.1964. Biometrics 20: 307-321. (#nhst)

  • A common misconception is that an effect exists only if it isstatistically significant and that it does not exist if it is not[statistically significant].

    Jonas Ranstam, A common misconception about p-value and itsconsequences. 1996. Acta Orthopaedica Scandinavica 67 : 505-507.(#nhst)

  • I contend that the general acceptance of statistical hypothesistesting is one of the most unfortunate aspects of 20th century appliedscience. Tests for the identity of population distributions, forequality of treatment means, for presence of interactions, for thenullity of a correlation coefficient, and so on, have been responsiblefor much bad science, much lazy science, and much silly science. A goodscientist can manage with, and will not be misled by, parameterestimates and their associated standard errors or confidence limits.

    Marks Nester, A Myopic View and History of HypothesisTesting. (#nhst)

  • The scientist must always give due thought to the statisticalanalysis, but must never let statistical analysis be a substitute forthinking!

    Marks Nester, A Myopic View and History of HypothesisTesting.(#science,significance,statistics)

  • The purpose of this paper is severalfold. First, we attempt toconvince the reader that at its worst, the results of statisticalhypothesis testing can be seriously misleading, and at its best itoffers no informational advantage over its alternatives; in fact itoffers less.

    D. Jones and N. Matloff, Statistical hypothesis testing inbiology: a contradiction in terms. 1986. Journal of Economic Entomology79: 1156-1160. (#nhst,significance)

  • In view of our long-term strategy of improving our theories, ourstatistical tactics can be greatly improved by shifting emphasis awayfrom over-all hypothesis testing in the direction of statisticalestimation. This always holds true when we are concerned with the actualsize of one or more differences rather than simply in the existence ofdifferences.

    David A. Grant, Testing the null hypothesis and the strategyand tactics of investigating theoretical models. 1962. PsychologicalReview 69 : 54-61. (#nhst)

  • The null hypothesis of no difference has been judged to be nolonger a sound or fruitful basis for statistical investigation…Significance tests do not provide the information that scientists need,and, furthermore, they are not the most effective method for analyzingand summarizing data.

    Cherry Ann Clark, Hypothesis testing in relation to statisticalmethodology. 1963. Review of Educational Research 33: 455-473.(#nhst,significance)

  • There is nothing wrong with the t-test; it has merely been usedto give an answer that was never asked for. The Student t-test answersthe question: ‘Is there any real difference between the means of themeasurement by the old and the new method, or could the apparentdifference have arisen from random variation?’ We already know thatthere is a real difference, so the question is pointless. The questionwe should have answered is: ‘How big is the difference between the twosets of measurements, and how precisely have we determined it?’

    L. Sayn-Wittgenstein, Statistics - salvation or slavery? 1965.Forestry Chronicle 41 : 103-105.(#nhst,significance)

  • Somehow there has developed a widespread belief that statisticalanalysis is legitimate only if it includes significance testing. Thisbelief leads to, and is fostered by, numerous introductory statisticstexts that are little more than catalogues of techniques for performingsignificance tests.

    D. G. Altman, Discussion of Dr Chatfield’s paper. 1985. Journalof the Royal Statistical Society A 148 : 242.(#nhst,significance)

  • Testing the equality of 2 true treatment means is ridiculous.They will always be different, at least beyond the hundredth decimalplace.

    V. Chew, Statistical hypothesis testing: an academic exercisein futility. 1977. Proceedings of the Florida State HorticulturalSociety 90 : 214-215. (#nhst)

  • It is surely apparent that anyone who wants to obtain asignificant difference badly enough can obtain one … choose a samplesize large enough.

    A. Binder, Further considerations on testing the nullhypothesis and the strategy and tactics of investigating theoreticalmodels. 1963. Psychological Review 70 : 107-115.(#nhst,significance,sampling)

  • As Confucius might have said, if the difference isn’t differentenough to make a difference, what’s the difference?

    V. Chew, Testing differences among means: correctinterpretation and some alternatives. 1980. HortScience 15(4) :467-470. (#nhst,significance)

  • Some hesitation about the unthinking use of significance tests isa sign of statistical maturity.

    D. S. Moore and G. P. McCabe, Introduction to the Practice ofStatistics. 1989. W. H. Freeman and Company (New York).(#nhst)

  • It is usually wise to give a confidence interval for theparameter in which you are interested.

    D. S. Moore and G. P. McCabe, Introduction to the Practice ofStatistics. 1989. W. H. Freeman and Company (New York).(#significance)

  • Unfortunately, when applied in a cook-book fashion, suchsignificance tests do not extract the maximum amount of informationavailable from the data. Worse still, misleading conclusions can bedrawn. There are at least three problems: (1) a conclusion that there isa significant difference can often be reached merely by collectingenough samples; (2) a statistically significant result is notnecessarily practically significant; and (3) reports of the presence orabsence of significant differences for multiple tests are not comparableunless identical sample sizes are used.

    G. B. McBride, J. C. Loftis, & N. C. Adkins, What dosignificance tests really tell us about the environment?. 1993.Environmental Management 17, 423-432 (1993).(#nhst,significance,sampling)

  • In many experiments it seems obvious that the differenttreatments must have produced some difference, however small, in effect.Thus the hypothesis that there is no difference is unrealistic: the realproblem is to obtain estimates of the sizes of the differences.

    William G. Cochran, and George M. Cox, Experimental Designs.2nd ed.1957. John Wiley & Sons, Inc.(#nhst,significance)

  • I suggest to you that Sir Ronald has befuddled us, mesmerized us,and led us down the primrose path. I believe that the almost universalreliance on merely refuting the null hypothesis as the standard methodfor corroborating substantive theories in the soft areas is a terriblemistake, is basically unsound, poor scientific strategy, and one of theworst things that ever happened in the history of psychology.

    P. E. Meehl, Theoretical risks and tabular asterisks: Sir Karl,Sir Ronald, and the slow progress of soft psychology. 1978. Journal ofConsulting and Clinical Psychology 46 : 806-834.(#nhst)

  • Probably all theories are false in the eyes of God.

    P. E. Meehl, Theoretical risks and tabular asterisks: Sir Karl,Sir Ronald, and the slow progress of soft psychology. 1978. Journal ofConsulting and Clinical Psychology 46 : 806-834.(#science,significance)

  • The grotesque emphasis on significance tests in statisticscourses of all kinds … is taught to people, who if they come away withno other notion, will remember that statistics is about tests forsignificant differences. … The apparatus on which their statisticscourse has been constructed is often worse than irrelevant, it ismisleading about what is important in examining data and makinginferences.

    John A. Nelder, Discussion of Dr Chatfield’s paper. 1985.Journal of the Royal Statistical Society A 148 : 238.(#nhst)

  • Statistics is intimately connected with science and technology,and few mathematicians have experience or understanding of the methodsof either.

    John A. Nelder, Discussion of Dr Chatfield’s paper. 1985.Journal of the Royal Statistical Society A, 148, p.238.(#nhst)

  • if experimenters realized how little is the chance of theirexperiments discovering what they are intended to discover, then a verysubstantial proportion of the experiments that are now in progress wouldhave been abandoned in favour of an increase in size of the remainingexperiments, judged more important.

    Jerzy Neyman, The use of the concept of power in agriculturalexperimentation. 1958. Journal of the Indian Society of AgriculturalStatistics 9 : 9-17. (#power)

  • What was the probability (power) of detecting interactions … inthe experiment performed? … The probability in question is frequentlyrelatively low … in cases of this kind the fact that the test failed todetect the existence of interactions does not mean very much. In fact,they may exist and have gone undetected.

    Jerzy Neyman, The use of the concept of power in agriculturalexperimentation. 1958. Journal of the Indian Society of AgriculturalStatistics 9 : 9-17. (#power)

  • In addition to important technical errors, fundamental errors inthe philosophy of science are frequently involved in this indiscriminateuse of the tests [of significance].

    Denton E. Morrison & Ramon E Henkel, Significance testsreconsidered. 1969. The American Sociologist 4 : 131-140.(#nhst,science,significance)

  • Researchers have long recognized the unfortunate connotations andconsequences of the term ‘significance,’ and we propose it is time for achange.

    Denton E. Morrison & Ramon E Henkel, Significance testsreconsidered. 1969. The American Sociologist 4 : 131-140.(#nhst,significance)

  • there is evidence that significance tests have been a genuineblock to achieving … knowledge.

    Denton E. Morrison & Ramon E Henkel, Significance testsreconsidered. 1969. The American Sociologist 4 : 131-140.(#nhst,significance,knowledge)

  • The twin assumptions of normality of distribution and hom*ogeneityof variance are not ever exactly fulfilled in practice, and often theydo not even hold to a good approximation.

    John W. Tukey, The problem of multiple comparisons. 1973.Unpublished manuscript, Dept. of Statistics, Princeton University.(#normality)

  • [A]sking ‘Are the effects different?’ is foolish.

    John W. Tukey, The philosophy of multiple comparisons. 1991.Statistical Science 6 : 100-116.(#nhst,statistics,statistician)

  • Empirical knowledge is always fuzzy! And theoretical knowledge,like all the laws of physics, as of today’s date, is always wrong-indetail, though possibly providing some very good approximationsindeed.

    John W. Tukey, The philosophy of multiple comparisons. 1991.Statistical Science 6 : 100-116.

  • scientists care about whether a result is statisticallysignificant, but they should care much more about whether it ismeaningful.

    Deirdre N. McCloskey, The insignificance of statisticalsignificance. 1995. Scientific American 272(4) : 104-105.(#nhst,science,significance)

  • The statistician should not always remain in his or her ownoffice: not only is relevant information more likely to be on hand inthe experimenter’s department, but in the longer term the statisticianstands to gain immeasurably in understanding of agricultural problems byoften visiting other departments and their laboratories and fields.

    David J. Finney, Was this in your statistics textbook? I.Agricultural Scientist and Statistician. 1988. Experimental Agriculture24 : 153-161. (#statistician)

  • Rigid dependence upon significance tests in single experiments isto be deplored.

    David J. Finney, Was this in your statistics textbook? III.Design and analysis. 1988. Experimental Agriculture 24 : 421-432.(#nhst,significance)

  • A null hypothesis that yields under two different treatments haveidentical expectations is scarcely very plausible, and its rejection bya significance test is more dependent upon the size of an experimentthan upon its untruth.

    David J. Finney, Was this in your statistics textbook? III.Design and analysis. 1988. Experimental Agriculture 24 : 421-432.(#nhst,significance)

  • I have failed to find a single instance in which the Duncan testwas helpful, and I doubt whether any of the alternative tests [multiplerange significance tests] would please me better.

    David J. Finney, Was this in your statistics textbook? III.Design and analysis. 1988. Experimental Agriculture 24 : 421-432.(#nhst,significance)

  • Is it ever worth basing analysis and interpretation of anexperiment on the inherently implausible null hypothesis that two (ormore) recognizably distinct cultivars have identical yieldcapacities?

    David J. Finney, Was this in your statistics textbook? III.Design and analysis. 1988. Experimental Agriculture 24 : 421-432.(#nhst)

  • Prediction is very difficult, especially of the future.

    Niels Henrick David Bohr(#time,science,history)

  • Standard errors of variance components are dumb because thedistribution of a variance component is not symmetric, but Chi-squaredand highly skewed.

    Doug Bates, Presentation at useR 2007(#skewness,models)

  • All data are wrong, but some are useful.

    Jim Kloet (after George Box), RStudio::Conf 2022(#data)

  • A lot of data science and analytics is just counting things andlabeling them.

    Hamdan Azhar, 2022 New York R Conference(#science,counts,data analysis)

  • An observation is judged significant, if it would rarely havebeen produced, in the absence of a real cause of the kind we areseeking. It is a common practice to judge a result significant, if it isof such a magnitude that it would have been produced by chance not morefrequently than once in twenty trials. This is an arbitrary, butconvenient, level of significance for the practical investigator, but itdoes not mean that he allows himself to be deceived once in every twentyexperiments. The test of significance only tells him what to ignore,namely all experiments in which significant results are not obtained. Heshould only claim that a phenomenon is experimentally demonstrable whenhe knows how to design an experiment so that it will rarely fail to givea significant result. Consequently, isolated significant results whichhe does not know how to reproduce are left in suspense pending furtherinvestigation.

    Ronald Fisher, The Statistical Method in Psychical Research,Proceedings of the Society for Psychical Research, 39: 189-192(1929). (#significance)

  • The statistician has no magic touch by which he may come in atthe stage of tabulation and make something of nothing. Neither will hisadvice, however wise in the early stages of a study, ensure successfulexecution and conclusion. Many a study, launched on the ways of elegantstatistical design, later boggled in execution, ends up with results towhich the theory of probability can contribute little.

    W. Edwards Deming, Principles of Professional StatisticalPractice. Annals of Mathematical Statistics, 36(6), 1883. (1965)(#statistician)

  • Evaluation of the statistical reliability of a set of results isnot mere calculation of standard errors and confidence limits. Thestatistician must go far beyond the statistical methods in textbooks. Hemust evaluate uncertainty in terms of possible uses of the data. Some ofthis writing is not statistical but draws on assistance from the expertin the subject-matter.

    W. Edwards Deming, Principles of Professional StatisticalPractice. Annals of Mathematical Statistics, 36(6), 1883. (1965)(#data,statistician)

  • An inference, if it is to have scientific value, must constitutea prediction concerning future data. If the inference is to be madepurely with the help of the distribution theory of statistics, theexperiments that constitute evidence for the inference must arise from astate of statistical control; until that state is reached, there is nouniverse, normal or otherwise, and the statistician’s calculations bythemselves are an illusion if not a delusion.

    W. Edwards Deming, Statistical Method from the Viewpoint ofQuality Control, 1939.(#statistics,science)

  • Data visualization is part art and part science. The challenge isto get the art right without getting the science wrong and viceversa.

    Claus O. Wilke, Fundamentals of Data Visualization(#data visualization,science)

  • In other words, the model is terrific in all ways other than thefact that it is totally useless. So why did we create it? In short,because we could: we have a data set, and a statistical package, and addthe former to the latter, hit a few buttons and voila, we have anotherpaper.

    Andew J. Vickers & Angel M. Cronin, Everything you alwayswanted to know about evaluating prediction models (but were too afraidto ask). Urology. 2010;76(6):1298-1301.(#models)

  • The definition of a medical statistician is one who will notaccept that Columbus discovered America because he said he was lookingfor India in the trial plan.

    Stephen J. Senn, Power is indeed irrelevant in interpretingcompleted studies. BMJ. 2002;325(7375):1304.(#statistician)

  • Inept graphics also flourish because many graphic artists believethat statistics are boring and tedious. It then follows that decoratedgraphics must pep up, animate, and all too often exaggerate whatevidence there is in the data. … If the statistics are boring, thenyou’ve got the wrong numbers.

    Edward R Tufte, The Visual Display of Quantitative Information,1983. (#data visualization,statistics)

  • Excellence in statistical graphics consists of complex ideascommunicated with clarity, precision, and efficiency. Graphical displaysshould show the data, induce the viewer to think about the substancerather that about the methodology, graphic design, the technology ofgraphic production, or something else, avoid distorting what the datahave to say, present many numbers in a small space make large data setscoherent, encourage the eye to compare different pieces of data, revealthe data at several levels of detail, from a broad overview to the finestructure, serve a reasonable clear purpose: description, exploration,tabulation, or decoration [should] be closely integrated with thestatistical and verbal descriptions of a data set.

    Edward R Tufte, The Visual Display of Quantitative Information,1983 (#data visualization)

  • If you can’t have an experiment, do the best you can withwhatever data you can gather, but do be very skeptical of historicaldata and subject them to all the logical tests you can think of.

    Robert Hooke, Statistics, Sports, and Some Other Things. In:Statistics: A Guide to the Unknown, Judith M. Tanur

  • The purely random sample is the only kind that can be examinedwith entire confidence by means of statistical theory, but there is onething wrong with it. It is so difficult and expensive to obtain for manyuses that sheer cost eliminates it.

    Darell Huff, How to Lie with Statistics, 1954.(#sampling)

  • Probability is the most important concept in modern science,especially as nobody has the slightest notion what it means.

    Bertrand Russell, 1929 Lecture (cited in Bell 1945, TheDevelopment of Mathematics, p.587)(#probability)

  • It is now proved beyond doubt that smoking is one of the leadingcauses of statistics.

    Fletcher Knebel, 1961(#statistics)

  • Statistics show that of those who contract the habit of eating,very few survive.

    William W Irwin

  • We are hardwired to make sense of the world around us - to noticepatterns and invent theories to explain these patterns. We underestimatehow easily patterns can be created by inexplicable random events - bygood luck and bad luck.

    Gary Smith, Standard Deviations, 2014

  • A very different - and very incorrect - argument is thatsuccesses must be balanced by failures (and failures by successes) sothat things average out. Every coin flip that lands heads makes tailsmore likely. Every red at roulette makes black more likely. … Thesebeliefs are all incorrect. Good luck will certainly not continueindefinitely, but do not assume that good luck makes bad luck morelikely, or vice versa.

    Gary Smith, Standard Deviations, 2014(#probability)

  • Remember that even random coin flips can yield striking, evenstunning, patterns that mean nothing at all. When someone shows you apattern, no matter how impressive the person’s credentials, consider thepossibility that the pattern is just a coincidence. Ask why, not what.No matter what the pattern, the question is: Why should we expect tofind this pattern?

    Gary Smith, Standard Deviations, 2014(#probability)

  • We are seduced by patterns and we want explanations for thesepatterns. When we see a string of successes, we think that a hot handhas made success more likely. If we see a string of failures, we think acold hand has made failure more likely. It is easy to dismiss suchtheories when they involve coin flips, but it is not so easy withhumans. We surely have emotions and ailments that can cause ourabilities to go up and down. The question is whether these fluctuationsare important or trivial.

    Gary Smith, Standard Deviations, 2014

  • [In statistics] you have the fact that the concepts are not veryclean. The idea of probability, of randomness, is not a cleanmathematical idea. You cannot produce random numbers mathematically.They can only be produced by things like tossing dice or spinning aroulette wheel. With a formula, any formula, the number you get would bepredictable and therefore not random. So as a statistician you have torely on some conception of a world where things happen in some way atrandom, a conception which mathematicians don’t have.

    Lucien LeCam, Interview, 1988(#probability)

  • Flip a coin 100 times. Assume that 99 heads are obtained. If youask a statistician, the response is likely to be: ‘It is a biased coin’.But if you ask a probabilist, he may say: ‘Wooow, what a rareevent’.

    Chamont Wang, Sense and Nonsense of Statistical Inference,1993 (#probability)

  • It is seen that continued shuffling may reasonably be expected toproduce perfect ‘randomness’ and to eliminate all traces of the originalorder. It should be noted, however, that the number of operationsrequired for this purpose is extremely large.

    William Feller, An Introduction To Probability Theory And ItsApplications, 1950

  • Figures may not lie, but statistics compiled unscientifically andanalyzed incompetently are almost sure to be misleading, and when thiscondition is unnecessarily chronic the so-called statisticians may becalled liars.

    Edwin B Wilson, Bulletin of the American Mathematical Society,Vol 18, 1912 (#statisticians)

  • The statistician’s job is to draw general conclusions fromfragmentary data. Too often the data supplied to him for analysis arenot only fragmentary but positively incoherent, so that he can do nextto nothing with them. Even the most kindly statistician swears heartilyunder his breath whenever this happens.

    M J Moroney, Facts from Figures, 1927(#statisticians)

  • Just as by ‘literacy’, in this context, we mean much more thanits dictionary sense of the ability to read and write, so by ‘numeracy’we mean more than mere ability to manipulate the rule of three. When wesay that a scientist is ‘illiterate’, we mean that he is not well enoughread to be able to communicate effectively with those who have had aliterary education. When we say that a historian or a linguist is‘innumerate’ we mean that he cannot even begin to understand whatscientists and mathematicians are talking about.

    Sir Geoffrey Crowther, A Report of the Central AdvisoryCommittee for Education, 1959, p.270.(#numeracy)

  • Numeracy has come to be an indispensable tool to theunderstanding and mastery of all phenomena, and not only of those in therelatively close field of the traditional natural sciences.

    Sir Geoffrey Crowther, A Report of the Central AdvisoryCommittee for Education, 1959, p.271.(#numeracy)

  • Numeracy has two facets–reading and writing, or extractingnumerical information and presenting it. The skills of data presentationmay at first seem ad hoc and judgmental, a matter of style rather thanof technology, but certain aspects can be formalized into explicitrules, the equivalent of elementary syntax.

    Andrew Ehrenberg, Rudiments of Numeracy, Journal of RoyalStatistical Society, 140, 277-297, 1977.(#numeracy)

  • People often feel inept when faced with numerical data. Many ofus think that we lack numeracy, the ability to cope with numbers. … Thefault is not in ourselves, but in our data. Most data are badlypresented and so the cure lies with the producers of the data. To drawan analogy with literacy, we do not need to learn to read better, butwriters need to be taught to write better.

    Andrew Ehrenberg, The problem of numeracy, AmericanStatistician 35, 67-71, 1981.(#numeracy)

  • To be numerate means to be competent, confident, and comfortablewith one’s judgements on whether to use mathematics in a particularsituation and if so, what mathematics to use, how to do it, what degreeof accuracy is appropriate, and what the answer means in relation to thecontext.

    Diana Coben, Numeracy, mathematics and adult learning,2000 (#numeracy)

  • Numeracy is the ability to process, interpret and communicatenumerical, quantitative, spatial, statistical, even mathematicalinformation, in ways that are appropriate for a variety of contexts, andthat will enable a typical member of the culture or subculture toparticipate effectively in activities that they value.

    Jeff Evans, Adults’ Mathematical Thinking and Emotion,2000 (#numeracy)

  • Statistics is the art of stating in precise terms that which onedoes not know.

    William Kruskal, Statistics, Moliere, and Henry Adams, AmericanScientist, 55, 416-428, 1967.(#statistics)

  • If significance tests are required for still larger samples,graphical accuracy is insufficient, and arithmetical methods areadvised. A word to the wise is in order here, however. Almost never doesit make sense to use exact binomial significance tests on such data -for the inevitable small deviations from the mathematical model ofindependence and constant split have piled up to such an extent that thebinomial variability is deeply buried and unnoticeable. Graphicaltreatment of such large samples may still be worthwhile because itbrings the results more vividly to the eye.

    Frederick Mosteller & John W Tukey, The Uses and Usefulnessof Binomial Probability Paper, Journal of the American StatisticalAssociation 44, 1949. (#significance,datavisualization)

  • Sequences of random numbers also inevitably display certainregularities. … The trouble is, just as no real die, coin, or roulettewheel is ever likely to be perfectly fair, no numerical recipe producestruly random numbers. The mere existence of a formula suggests some sortof predictability or pattern.

    Ivars Peterson, The Jungles of Randomness: A MathematicalSafari, 1998. (#random numbers)

  • It is very easy to devise different tests which, on the average,have similar properties, … they behave satisfactorily when the nullhypothesis is true and have approximately the same power of detectingdepartures from that hypothesis. Two such tests may, however, give verydifferent results when applied to a given set of data. The situationleads to a good deal of contention amongst statisticians and muchdiscredit of the science of statistics. The appalling position caneasily arise in which one can get any answer one wants if only one goesaround to a large enough number of statisticians.

    Frances Yates, Discussion on the Paper by Dr.Box andDr.Andersen, Journal of the Royal Statistical Society B Vol. 17,1955

  • Beware of the problem of testing too many hypotheses; the moreyou torture the data, the more likely they are to confess, butconfessions obtained under duress may not be admissible in the court ofscientific opinion.

    Stephen M Stigler, Neutral Models in Biology, 1987,p.148. (#significance)

  • Statistics may be regarded as (i) the study of populations, (ii)as the study of variation, and (iii) as the study of methods of thereduction of data.

    Sir Ronald A Fisher, Statistical Methods for Research Workers,1925 (#statistics)

  • The primes have tantalized mathematicians since the Greeks,because they appear to be somewhat randomly distributed but notcompletely so. … Although the prime numbers are rigidly determined, theysomehow feel like experimental data.

    Timothy Gowers, Mathematics: A Very Short Introduction,2002 (#random numbers)

  • Frequentist statistics assumes that there is a ‘true’ state ofthe world (e.g.the difference between species in predation probability)which gives rise to a distribution of possible experimental outcomes.The Bayesian framework says instead that the experimental outcome - whatwe actually saw happen - is the truth, while the parameter values orhypotheses have probability distributions. The Bayesian framework solvesmany of the conceptual problems of frequentist statistics: answersdepend on what we actually saw and not on a range of hypotheticaloutcomes, and we can legitimately make statements about the probabilityof different hypotheses or parameter values.

    Ben Bolker, Ecological Models and Data in R, 2007(#bayesian)

  • A statistical estimate may be good or bad, accurate or thereverse; but in almost all cases it is likely to be more accurate than acasual observer’s impression, and the nature of things can only bedisproved by statistical methods.

    Sir Arthur L Bowley, Elements of Statistics, 1901(#statistics)

  • An extremely odd demand is often set forth but never met, even bythose who make it; i.e., that empirical data should be presented withoutany theoretical context, leaving the reader, the student, to his owndevices in judging it. This demand seems odd because it is uselesssimply to look at something. Every act of looking turns intoobservation, every act of observation into reflection, every act ofreflection into the making of associations; thus it is evident that wetheorize every time we look carefully at the world.

    Johann Wolfgang von Goethe(#data)

  • From the moment we first roll a die in a children’s board game,or pick a card (any card), we start to learn what probability is. Buteven as adults, it is not easy to tell what it is, in the generalway.

    David Stirzaker, Probability and Random Variables: A Beginner’sGuide, 1999 (#probability)

  • We cannot really have a perfectly shuffled pack of perfect cards;this ‘collection of equally likely hands’ is actually a fiction. Wecreate the idea, and then use the rules of arithmetic to calculate therequired chances. This is characteristic of all mathematics, whichconcerns itself only with rules defining the behaviour of entities whichare themselves undefined (such as ‘numbers’ or ‘points’).

    David Stirzaker, Probability and Random Variables: A Beginner’sGuide, 1999

  • The whole point of probability is to discuss uncertaineventualities before they occur. After this event, things are completelydifferent.

    David Stirzaker, Probability and Random Variables: A Beginner’sGuide, 1999 (#probability)

  • There is no such thing as randomness. No one who could detectevery force operating on a pair of dice would ever play dice games,because there would never be any doubt about the outcome. Therandomness, such as it is, applies to our ignorance of the possibleoutcomes. It doesn’t apply to the outcomes themselves. They are 100%determined and are not random in the slightest. Scientists have becomeso confused by this that they now imagine that things really do happenrandomly, i.e.for no reason at all.

    Thomas Stark, God Is Mathematics: The Proofs of the EternalExistence of Mathematics, 2018

  • Why is the human need to be in control relevant to a discussionof random patterns? Because if events are random, we are not in control,and if we are in control of events, they are not random. There istherefore a fundamental clash between our need to feel we are in controland our ability to recognize randomness. That clash is one of theprincipal reasons we misinterpret random events.

    Leonard Mlodinow, The Drunkard’s Walk: How Randomness Rules OurLives, 2008

  • An experiment is a failure only when it also fails adequately totest the hypothesis in question, when the data it produces don’t proveanything one way or the other.

    Robert M Pirsig, Zen and the Art of Motorcycle Maintenance,1974

  • A hypothesis is empirical or scientific only if it can be testedby experience. […] A hypothesis or theory which cannot be, at least inprinciple, falsified by empirical observations and experiments does notbelong to the realm of science.

    Francisco J Ayala, Biological Evolution: Natural Selection orRandom Walk, American Scientist, 1974

  • …no one believes an hypothesis except its originator but everyonebelieves an experiment except the experimenter.

    William I B Beveridge, The Art of Scientific Investigation,1950

  • The hypothesis is the principal intellectual instrument inresearch. Its function is to indicate new experiments and observationsand it therefore sometimes leads to discoveries even when not correctit*elf. We must resist the temptation to become too attached to ourhypothesis, and strive to judge it objectively and modify it or discardit as soon as contrary evidence is brought to light. Vigilance is neededto prevent our observations and interpretations being biased in favor ofthe hypothesis. Suppositions can be used without being believed.

    William I B Beveridge, The Art of Scientific Investigation,1950 (#science)

  • Experiments are like cross-questioning a witness who will tellthe truth but not the whole truth.

    Alan Gregg, The Furtherance of Medical Research,1941

  • A random sequence is a vague notion embodying the idea of asequence in which each term is unpredictable to the uninitiated andwhose digits pass a certain number of tests traditional withstatisticians and depending somewhat on the uses to which the sequenceis to be put.

    Derrick H Lehmer, 1951 (#randomnumbers)

  • The moment you forecast you know you’re going to be wrong, youjust don’t know when and in which direction.

    Edgar R Fiedler, “Across the Board”, 1977 (#timeseries)

  • Statistics at its best provides methodology for dealingempirically with complicated and uncertain information, in a way that isboth useful and scientifically valid.

    John M Chambers, 1993(#statistics)

  • …it does not seem helpful just to say that all models are wrong.The very word model implies simplification and idealization. The ideathat complex physical, biological or sociological systems can be exactlydescribed by a few formulae is patently absurd. The construction ofidealized representations that capture important stable aspects of suchsystems is, however, a vital part of general scientific analysis andstatistical models, especially substantive ones, do not seem essentiallydifferent from other kinds of model.

    Sir David Cox, Comment on ‘Model uncertainty, data mining andstatistical inference’, Journal of the Royal Statistical Society, SeriesA 158, 1995. (#models,uncertainty)

  • The science of statistics may be described as exploring,analyzing and summarizing data; designing or choosing appropriate waysof collecting data and extracting information from them; andcommunicating that information. Statistics also involves constructingand testing models for describing chance phenomena. These models can beused as a basis for making inferences and drawing conclusions and,finally, perhaps for making decisions.

    Fergus Daly et al, Elements of Statistics, 1995(#statistics,knowledge)

  • If a man stands with his left foot on a hot stove and his rightfoot in a refrigerator, the statistician would say that, on the average,he’s comfortable.

    Walter Heller (#statistician)

  • Things are changing. Statisticians now recognize that computerscientists are making novel contributions while computer scientists nowrecognize the generality of statistical theory and methodology. Cleverdata mining algorithms are more scalable than statisticians ever thoughtpossible. Formal statistical theory is more pervasive than computerscientists had realized.

    Larry A Wasserman, All of Statistics: A concise course instatistical inference, 2004(#statistician)

  • One feature […] which requires much more justification than isusually given, is the setting up of unplausible null hypotheses. Forexample, a statistician may set out a test to see whether two drugs haveexactly the same effect, or whether a regression line is exactlystraight. These hypotheses can scarcely be taken literally.

    Cedric A B Smith, Book review of Norman T. J. Bailey:Statistical Methods in Biology, Applied Statistics 9, 1960(#statistician)

  • In general, it is necessary to have some data on which tocalculate probabilities. […] Statisticians do not evolve probabilitiesout of their inner consciousness, they merely calculate them.

    Leonard C Tippett(#data,probability)

  • Even properly done statistics can’t be trusted. The plethora ofavailable statistical techniques and analyses grants researchers anenormous amount of freedom when analyzing their data, and it istrivially easy to ‘torture the data until it confesses’.

    Alex Reinhart, Statistics Done Wrong: The Woefully CompleteGuide, 2015

  • Using data from the population as it stands is a dangeroussubstitute for testing.

    Frederick Mosteller & Gale Mosteller, “New StatisticalMethods in Public Policy. Part I: Experimentation”, Journal ofContemporary Business 8, 1979

  • The closer that sample-selection procedures approach the goldstandard of random selection - for which the definition is that everyindividual in the population has an equal chance of appearing in thesample - the more we should trust them. If we don’t know whether asample is random, any statistical measure we conduct may be biased insome unknown way.

    Richard E Nisbett, “Mindware: Tools for Smart Thinking”,2015

  • A popular misconception holds that the era of Big Data means theend of a need for sampling. In fact, the proliferation of data ofvarying quality and relevance reinforces the need for sampling as a toolto work efficiently with a variety of data, and minimize bias. Even in aBig Data project, predictive models are typically developed and pilotedwith samples.

    Peter C Bruce & Andrew G Bruce, “Statistics for DataScientists: 50 Essential Concepts”, 2016(#sampling,data)

  • All predictions are statistical, but some predictions have such ahigh probability that one tends to regard them as certain.

    Marshall J Walker, The Nature of Scientific Thought, 1963(#uncertainty)

  • Statistics is a scientific discipline concerned with collection,analysis, and interpretation of data obtained from observation orexperiment. The subject has a coherent structure based on the theory ofProbability and includes many different procedures which contribute toresearch and development throughout the whole of Science andTechnology.

    Egon Pearson, 1936(#statistics,probability,science)

  • [Statistics] is both a science and an art. It is a science inthat its methods are basically systematic and have general application;and an art in that their successful application depends to aconsiderable degree on the skill and special experience of thestatistician, and on his knowledge of the field of application,e.g.economics.

    Leonard H C Tippett, Statistics, 1943(#statistics)

  • The fact must be expressed as data, but there is a problem inthat the correct data is difficult to catch. So that I always say ‘Whenyou see the data, doubt it!’ ‘When you see the measurement instrument,doubt it!’ […]For example, if the methods such as sampling, measurement,testing and chemical analysis methods were incorrect, data. […] tomeasure true characteristics and in an unavoidable case, usingstatistical sensory test and express them as data.

    Kaoru Ishikawa, Annual Quality Congress Transactions, 1981(#data,sampling)

  • There is a tendency to mistake data for wisdom, just as there hasalways been a tendency to confuse logic with values, intelligence withinsight. Unobstructed access to facts can produce unlimited good only ifit is matched by the desire and ability to find out what they mean andwhere they lead.

    Norman Cousins, “Human Options : An Autobiographical Notebook”,1981 (#data,knowledge)

  • Data in isolation are meaningless, a collection of numbers. Onlyin context of a theory do they assume significance…

    George Greenstein, “Frozen Star”, 1983(#data)

  • Intuition becomes increasingly valuable in the new informationsociety precisely because there is so much data.

    John Naisbitt, “Re-Inventing the Corporation”, 1985(#data)

  • No matter what the laws of chance might tell us, we search forpatterns among random events wherever they might occur–not only in thestock market but even in interpreting sporting phenomena.

    Burton G. Malkiel, “A Random Walk Down Wall Street: TheTime-Tested Strategy For Successful Investing”, 2011, p.149.(#random numbers)

  • One can be highly functionally numerate without being amathematician or a quantitative analyst. It is not the mathematicalmanipulation of numbers (or symbols representing numbers) that iscentral to the notion of numeracy. Rather, it is the ability to drawcorrect meaning from a logical argument couched in numbers. When such alogical argument relates to events in our uncertain real world, theelement of uncertainty makes it, in fact, a statistical argument.

    Eric R Sowey, The Getting of Wisdom: Educating Statisticians toEnhance Their Clients’ Numeracy, The American Statistician 57(2),2003 (#numeracy,uncertainty)

  • We would wish ‘numerate’ to imply the possession of twoattributes. The first of these is an ‘at-homeness’ with numbers and anability to make use of mathematical skills which enable an individual tocope with the practical mathematical demands of his everyday life. Thesecond is ability to have some appreciation and understanding ofinformation which is presented in mathematical terms, for instance ingraphs, charts or tables or by reference to percentage increase ordecrease.

    co*ckcroft Committee, Mathematics Counts: A Report into theTeaching of Mathematics in Schools, 1982 (#numeracy,datavisualization)

  • In all scientific fields, theory is frequently more importantthan experimental data. Scientists are generally reluctant to accept theexistence of a phenomenon when they do not know how to explain it. Onthe other hand, they will often accept a theory that is especiallyplausible before there exists any data to support it.

    Richard Morris, 1983(#science,data)

  • To find out what happens to a system when you interfere with ityou have to interfere with it (not just passively observe it).

    George E P Box, “Use and Abuse of Regression”, 1966(#Box quotes)

  • Since all models are wrong the scientist cannot obtain a‘correct’ one by excessive elaboration. On the contrary followingWilliam of Occam he should seek an economical description of naturalphenomena. Just as the ability to devise simple but evocative models isthe signature of the great scientist so overelaboration andoverparameterization is often the mark of mediocrity.

    George E P Box, Science and Statistics, Journal of the AmericanStatistical Association 71, 1976 (#Boxquotes,models)

  • Since all models are wrong the scientist must be alert to what isimportantly wrong. It is inappropriate to be concerned about mice whenthere are tigers abroad.

    George E P Box, Science and Statistics, Journal of the AmericanStatistical Association 71, 1976 (#models,science,Boxquotes)

  • The fact that [the model] is an approximation does notnecessarily detract from its usefulness because models areapproximations. All models are wrong, but some are useful.

    George E P Box, 1987 (#models,science,Boxquotes)

  • The central limit theorem says that, under conditions almostalways satisfied in the real world of experimentation, the distributionof such a linear function of errors will tend to normality as the numberof its components becomes large. The tendency to normality occurs almostregardless of the individual distributions of the component errors. Animportant proviso is that several sources of error must make importantcontributions to the overall error and that no particular source oferror dominate the rest.

    George E P Box et al, “Statistics for Experimenters: Design,discovery, and innovation” 2nd Ed., 2005(#normality)

  • The postulate of randomness thus resolves itself into thequestion, ‘of what population is this a random sample?’ which mustfrequently be asked by every practical statistician.

    Ronald Fisher, “On the Mathematical Foundation of TheoreticalStatistics”, Philosophical Transactions of the Royal Society of LondonVol. A222, 1922 (#randomnumbers,sampling,statistics)

  • Statistics has been likened to a telescope. The latter enablesone to see further and to make clear objects which were diminished orobscured by distance. The former enables one to discern structure andrelationships which were distorted by other factors or obscured byrandom variation.

    David J Hand, “The Role of Statistics in Psychiatry”,Psychological Medicine Vol. 15, 1985(#statistics)

  • When looking at the end result of any statistical analysis, onemust be very cautious not to over interpret the data. Care must be takento know the size of the sample, and to be certain the method forgathering information is consistent with other samples gathered. […] Noone should ever base conclusions without knowing the size of the sampleand how random a sample it was. But all too often such data is notmentioned when the statistics are given - perhaps it is overlooked oreven intentionally omitted.

    Theoni Pappas, “More Joy of Mathematics: Exploring mathematicalinsights & concepts”, 1994(#sampling)

  • BREAKING: The Supreme Court just ruled 6-3 that according to theUS constitution logistic regression IS machine learning.

    Kareem Carr, @Kareem_Carr,Twitter 7/1/21 (#models)

  • In questions of science the authority of a thousand is not worththe humble reasoning of a single individual

    Galileo Galilei, 1632, Dialog concerning the Two Chief WorldSystems (#science)

  • Science is organized knowledge. Wisdom is organized life.

    Immanuel Kant(#science,knowledge)

  • When you steal from one author, it’s plagiarism; if you stealfrom many, it’s research.

    Wilson Mizner (American playwright and entrepreneur)(#research,ethics)

  • You can’t always get what you want, but if you try, sometimes,well you just might find, you get what you need.

    Mick Jagger (#data,science)

  • He who gives up code safety for code speed deserves neither.

    Hadley Wickham (#computing)

  • Any fool can write code that a computer can understand. Goodprogrammers write code that humans can understand.

    Martin Fowler, Refactoring: Improving the Design of ExistingCode (#computing)

  • Thank you for sending me a copy of your book. I’ll waste no timereading it.

    Moses Hadas (#reviews)

  • I will let the data speak for itself when it cleans itself.

    Allison Reichel (#data)

  • To the untrained eye, randomness appears as regularity ortendency to cluster.

    W. Feller, An Introduction to Probability Theory and itsApplications (1950) (#probability,datavisualization)

  • Your assumptions are your windows on the world. Scrub them offevery once in a while, or the light won’t come in.

    Alan Alda, 1936(#assumptions)

  • Little experience is sufficient to show that the traditionalmachinery of statistical processes is wholly unsuited to the needs ofpractical research. Not only does it take a cannon to shoot a sparrow,but it misses the sparrow! The elaborate mechanism built on the theoryof infinitely large samples is not accurate enough for simple laboratorydata. Only by systematically tackling small sample problems on theirmetrics does it seem possible to apply accurate tests to practicaldata.

    Ronald Fisher, Statistical Methods for Research Workers(1925) (#sample size)

  • All generative models are wrong, but some are useful.

    Jared Lander, Copilot for R, 56:25(#models)

  • Everyone who has carried out experiments in the field or farmyardmust be well aware that the result of a single experiment is very oftenentirely misleading. Yet it is still common practice to publish singleresults and to base practical advice upon them.

    T. B. Wood and F. J. M. Stratton, The interpretation ofexperimental results. The Journal of Agricultural Science, 3,417-440. (#science)

  • The preparation of clear and simple plans, and a convenientsystem of numbering the [treatments] that are to be applied, willlighten the work of the man in the field, who is usually operating underaverse conditions, is frequently in a hurry, and is sometimes not verycertain of the points at issue.

    F. Yates, The Design and Analysis of Factorial Experiments(1937). Harpenden Imperial Bureau of Soil Science.(#biometry, expt design)

  • I do not always agree with Sir Ronald Fisher, but it is due tohim that the standard of presentation of results in agriculture isbetter than in any of the so-called exact sciencees; and this is a stateof affairs that physicists should cease to tolerate.

    Sir Harald Jeffreys, Half a Century in Geophysics: An originalarticle from the Report of the British Association for the Advancementof Science, 1953. (#science)

  • Data are not just numbers, they are numbers with a context. … Indata analysis, context provides meaning.

    George W. Cobb & David S. Moore, Mathematics, Statistics,and Teaching. American Mathematical Monthly, 801-23.(#numeracy)

  • When [the profession of] statistics becomes clearly embedded inpeople’s minds as being concerned with investigation ratherthan simply calculation (or worse still, mere cataloguing ofdata), there can be no room for doubts about the relevance of thesubject.

    C. J. Wild, Embracing the ‘Wider view’ of Statistics, TheAmerican Statistician, 48, 163-171. p.165.

  • We have to teach non-statisticians to recognize where statisticalexpertise is required. No one else will. We teach students how to solvesimple statistical problems, but how often do we make any serious effortto teach them to recognize situations that call for statisticalexpertise that is beyond the technical content of the course.=?

    C. J. Wild, Embracing the ‘Wider view’ of Statistics, TheAmerican Statistician, 48, 163-171. p.166.(#teaching)

  • A careful and sophisticated analysis of the data is often quiteuseless if the statistician cannot communicate the essential features ofthe data to a client for whom statistics is an entirely foreignlanguage.

    C. J. Wild, Embracing the ‘Wider view’ of Statistics, TheAmerican Statistician, 48, 163-171. p.170.

  • The dominant feature of our 129 barley trials was the largedifferences…overy the five years of the study, even for the same field.To give sound advice on choice of plot size and shape, statisticianswill require results from many trials conducted over many years, not tomention crops.

    Dorothy Robinson, Discussion of the paper by Dr Brewer andProfessor Mead, Journal of the Royal Statistical Society. Series A,1986, 149, pp.314-348. Quote on p.341.

  • If we need a short suggestion of what exploratory data analysisis, I would suggest that: 1. it is an attitude, AND 2. a flexibility,AND 3. some graph paper (or transparencies, or both).

    John W. Tukey, Jones, L. V. (Ed.). (1986). The collected worksof John W. Tukey: Philosophy and principles of data analysis 1949-1964(Vols. III & IV). London: Chapman & Hall.(#eda,data visualization)

  • Three of the main strategies of data analysis are: 1. graphicalpresentation. 2. provision of flexibility in viewpoint and infacilities, 3. intensive search for parsimony and simplicity.

    John W. Tukey, Jones, L. V. (Ed.). (1986). The collected worksof John W. Tukey: Philosophy and principles of data analysis 1949-1964(Vols. III & IV). London: Chapman & Hall. (#datavisualization,data analysis)

  • If you torture the data enough, nature will always confess.

    Ronald Coase, quoted from Coase, R. H. (1982). How shouldeconomists chose? American Enterprise Institute, Washington, D. C.(#data analysis)

  • A big computer, a complex algorithm and a long time does notequal science.

    Robert Gentleman(#science,computing)

  • Absence of evidence is not evidence of absence.

    Martin Rees, Project Cyclops: A Design Study of a System forDetecting Extraterrestrial Intelligent Life (1971) by Bernard M. Oliver,and John Billingham (#science)

  • Statistics - A subject which most statisticians find difficultbut which many physicians are experts on.

    Stephen Senn, Senn, S. (2007). Statistical Issues in DrugDevelopment (2nd Edition). Chichester: John Wiley & Sons(#statistics)

  • The statistician cannot evade the responsibility forunderstanding the process he applies or recommends.

    Ronald A Fisher, Fisher, R. A. (1971 [1935]). The Design ofExperiments (9th ed.). Macmillan.(#statistics,science)

  • Taking a model too seriously is really just another way of nottaking it seriously at all.

    Andrew Gelman (#models)

  • What the use of a p-value implies, therefore, is that ahypothesis that may be true may be rejected because it has not predictedobservable results that have not occurred.

    Harold Jeffreys, Jeffreys, H. (1939). Theory of Probability.Oxford, England: Clarendon Press.(#p-values)

  • Statistics are no substitution for judgment.

    Henry Clay, Evening Sentinel (Staffordshire Sentinel), 1930October 13, Production Prices and Depression: Professor Clay on theTrade Outlook, Quote Page 5, Column 5, Staffordshire, England. (BritishNewspaper Archive)

  • The most effective debugging tool is still careful thought,coupled with judiciously placed print statements.

    Brian Kernighan, Unix for Beginners (1979)(#computing)

  • We live on an island of knowledge surrounded by a sea ofignorance. As our island of knowledge grows, so does the shore of ourignorance.

    John A. Wheeler, Scientific American, 1992(#science,knowledge)

  • It’s not just about visualizing information, but makinginformation visible.

    Jose Duarte (#datavisualization)

  • I don’t know what the definition of “philosopher” is. But I knowfor a fact that a “statistician” is someone who has written an Rpackage.

    Richard McElrath, Twitter, 4-29-2023(#computing,statistics)

  • In every project you have at least one other collaborator;future-you. You don’t want future-you to curse past-you.

    Solomon Kurz (#time)

  • The whole point of science is that most of it is uncertain.That’s why science is exciting–because we don’t know. Science is allabout things we don’t understand. The public, of course, imaginesscience is just a set of facts. But it’s not. Science is a process ofexploring, which is always partial. We explore, and we find out thingsthat we understand. We find out things we thought we understood werewrong. That’s how it makes progress.

    Freeman Dyson, mentioned in a 2014 interview(#science,knowledge,uncertainty)

  • Quotes on Statistics, Data Visualization and Science (2024)

    FAQs

    What is a famous quote about data visualization? ›

    "No matter how clever the choice of the information, and no matter how technologically impressive the encoding, a visualization fails if the decoding fails." John Tukey, Exploratory Data Analysis: "The greatest value of a picture is when it forces us to notice what we never expected to see."

    What are some quotes about data science and statistics? ›

    It's easy to lie with statistics It's hard to tell the truth without statistics.” “The world is one big data problem.” “Consumer data will be the biggest differentiator in the next two to three years Whoever unlocks the reams of data and uses it strategically will win.” “Data levels all arguments.”

    What is the famous quote about statistics? ›

    "Lies, damned lies, and statistics" is a phrase describing the persuasive power of statistics to bolster weak arguments, "one of the best, and best-known" critiques of applied statistics. It is also sometimes colloquially used to doubt statistics used to prove an opponent's point.

    What is a famous quote about data analytics? ›

    19 Inspirational Quotes About Data
    • “Without big data, you are blind and deaf and in the middle of a freeway.” — Geoffrey Moore.
    • “In God we trust, all others bring data.” — W. ...
    • “Data is the new oil.” — Clive Humby.
    • “No great marketing decisions have ever been made on qualitative data.” — John Sculley.
    Dec 21, 2023

    What is the golden rule of data visualization? ›

    This is the golden rule. Always choose the simplest way to convey your information. Identify the relationships and patterns of your data and focus on what you want to show.

    What did Mark Twain say about data? ›

    Mark Twain apparently said the following: “Data is like garbage, You'd better know what you are going to do with it before you collect it.” Of course he was quite wrong. What he meant to say was “Data are like garbage …” and so on – very amusing. But his wrongness goes beyond pedantry.

    Why is statistics and data science important? ›

    The importance of statistics in data science is for descriptive understanding, modeling relationships, inference and decision-making, the probability for uncertainty, validity, and reliability.

    What are some interesting statistics about data science? ›

    Less than 0.5% of all data we create is ever used or analyzed. According to the Digital 2022 Global Overview Report, the world will spend over 12 ½ trillion hours on the internet in 2022 alone. According to PragmaticWorks, global businesses lose about 20-35% of their operating revenue from poor-quality data.

    Is statistics enough for data science? ›

    Data science and statistics go hand in hand. In data science, we collect, analyze, and visualize data. Statistics give us a lens on our data to spot patterns, trends, and connections. Statistics help us see when our analysis is off and ensure our analysis isn't just based on intuition but grounded in fact.

    What did Churchill say about statistics? ›

    The only statistics you can trust are those you falsified yourself is a quote attributed to former British Prime Minister Sir Winston Churchill (1874 - 1965).

    What is a quote about descriptive statistics? ›

    Descriptive statistics exist to simplify, which always implies some loss of nuance or detail. Probability doesn't make mistakes; people using probability make mistakes.

    Why are statistics so powerful? ›

    Statistics are important because they can be applied to nearly everything, from your personal life to critical and complex decisions made by large companies and national governments.

    What is a common maxim about data scientists? ›

    What is a common maxim about data scientists? They spend 80% of their time finding and preparing data and 20% analyzing it.

    What is a famous quote about analysis? ›

    Curiosity begins as an act of tearing to pieces or analysis. The chess player who develops the ability to play two dozen boards at a time will benefit from learning to compress his or her analysis into less time. Truth suffers from too much analysis. Science cannot solve the ultimate mystery of nature.

    What is a quote about data strategy? ›

    Those companies that view data as a strategic asset are the ones that will survive and thrive.” “Doesn't matter how much data you have, it's whether you use it successfully that counts.” “If every business, regardless of size, is now a data business, every business, therefore, needs a robust data strategy.”

    What is the beauty of data visualization? ›

    Visualised data helps us to detect patterns and trends, allowing us to make informed decisions.

    What is also known as data visualization? ›

    Data visualization is the practice of translating information into a visual context, such as a map or graph, to make data easier for the human brain to understand and pull insights from.

    Top Articles
    Latest Posts
    Article information

    Author: Dr. Pierre Goyette

    Last Updated:

    Views: 5579

    Rating: 5 / 5 (50 voted)

    Reviews: 81% of readers found this page helpful

    Author information

    Name: Dr. Pierre Goyette

    Birthday: 1998-01-29

    Address: Apt. 611 3357 Yong Plain, West Audra, IL 70053

    Phone: +5819954278378

    Job: Construction Director

    Hobby: Embroidery, Creative writing, Shopping, Driving, Stand-up comedy, Coffee roasting, Scrapbooking

    Introduction: My name is Dr. Pierre Goyette, I am a enchanting, powerful, jolly, rich, graceful, colorful, zany person who loves writing and wants to share my knowledge and understanding with you.