[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
n j cox <n.j.cox@durham.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Boxplot + line |

Date |
Wed, 26 Sep 2007 11:16:23 +0100 |

Stata's version of the _categorical imperative_, although

not as catchy as that formulated by Immanuel Kant, applies

here. Otherwise known as Wiggins' Third Law, it states

"If the going gets tough with a categorical graph,

start [all] over [again] with -twoway-."

A categorical graph here means -graph box-, -graph hbar-, -graph bar-

or -graph dot-. I will not mention -graph pie-.

Otherwise put, Allan is right. -graph box- won't let you use -addplot()-

(-plot()- in Stata 8). -addplot()- superimposes one or more -twoway- graphs on top of another. As -graph box- is not a -twoway- type,

-addplot()- is out of the question.

Moreover, although -dotplot- allows a crude representation of boxes,

it too does not allow -addplot()-. I guess that falls under the heading

of "Will anyone want that? Probably not."

No matter. There are several alternatives.

Allan can get arbitrarily close to what (he thinks) he wants.

In fact, he can get what may well appear better graphs than

what he is asking for.

A first possibility is to install -stripplot- from SSC.

As mentioned a while back on this list, -stripplot-

has a -box- option. In fact, it also has a -box()-

option with arguments for tuning the box.

After

sysuse auto, clear

you can get this

stripplot displacement , over(rep78) vertical

box(bfcolor(gs14) barw(0.2)) addplot(lfit displacement rep78, lcolor(black))

That is a scatter plot with boxes showing median

and quartiles with a regression line superimposed.

The result is no one's conventional box plot. It is a dot-box

plot (references in the help file, and more welcome). The idea

goes back as least as far as a suggestion of Jerry Dallal to Leland

Wilkinson.

The programmer of -stripplot- told me, in confidence, that

he does not like whiskers, so -stripplot- does not support whiskers.

That's also explicit in the help. In fact, he doesn't much like

box plots, which he asserts have become too widely used. Box plots

are often used for comparisons of only a few categories, when

usually a lot more detail could be shown helpfully, and without

confusing the reader.

But you can subvert that prejudice. You just need a little more

work.

First off, the quartiles are easy to get:

egen upq = pctile(displacement), by(rep78) p(75)

egen loq = pctile(displacement), by(rep78) p(25)

Next, what Tukey called the _adjacent values_, the ends of

the whiskers, are in fact quite easy to too, but you

must install -egenmore- from SSC first,

egen uadj = adju(displacement), by(rep78)

egen ladj = adjl(displacement), by(rep78)

Now we can do this:

stripplot displacement , over(rep78) vertical ///

box(bfcolor(gs14) barw(0.2)) ms(none) ///

addplot( ///

scatter disp rep78 if disp < ladj | disp > uadj, mcolor(black) || ///

rspike upq uadj rep78, lcolor(black) || ///

rspike loq ladj rep78, lcolor(black) || ///

lfit displacement rep78 , lcolor(red) ///

)

In English, not Stata

1. We draw a -stripplot- with a box and suppress all the marker

symbols.

2. On that we superimpose a scatter plot of the points beyond

the adjacent values.

3. On that we superimpose spikes connecting the quartiles and

the adjacent values.

4. On that we superimpose a regression line.

If you really want, you can cap the spikes too:

stripplot displacement , over(rep78) vertical ///

box(bfcolor(gs14) barw(0.2)) mcolor(black) ms(none) ///

addplot( ///

scatter disp rep78 if disp < ladj | disp > uadj, mcolor(black) || ///

rspike upq uadj rep78, lcolor(black) || ///

rspike loq ladj rep78, lcolor(black) || ///

rcap uadj uadj rep78 if uadj != upq, lcolor(black) || ///

rcap ladj ladj rep78 if ladj != loq, lcolor(black) || ///

lfit displacement rep78 , lcolor(red) ///

)

Note that you don't really need -stripplot-. You have the

quartiles and could draw those directly with -twoway rbar-.

But I started out with -stripplot- and kept playing.

Of course, Allan should get someone to pay for Stata 10. The graph

editor alone will give hours of endless amusement. As it happens,

all this is possible in Stata 8.2.

Nick

n.j.cox@durham.ac.uk

Allan Reese

--------------------------------------------------------------------------------

The manual notes "box charts are implicitly categorical" but your over variable may equally have ordered or true quantitative values. Given a boxplot with categories 1,2,3,... , is there a way to add a regression line over the boxes? addline offers only vertical and horizontal lines; addplot isn't available; and "|| scatteri" isn't allowed.

Any ideas please (still on v9, but this may be the spur to move it on up).

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: SV: Re: Memory problems when creating a spatial weight matrix** - Next by Date:
**Re: st: Weighted averages** - Previous by thread:
**st: Boxplot + line** - Next by thread:
**st: re: checking matrix column names** - Index(es):

© Copyright 1996–2021 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |