Wednesday, November 28, 2007

Coming Up Next: 5 Advanced Mac/Stata Tips

In the next posts, I'll cover some more advanced techniques:


  1. Using Quicksilver to quickly insert commands in a text file
  2. Integrating the TextWrangler services menu
  3. Setting up suffix mapping in TextWrangler
  4. Using DocumentPalette to open Stata template files
  5. Modifying Stata graph output




(Quicksilver, TextWrangler and DocumentPalette are freeware programs that work on any version of OS X.)

I'm also planning to add an RSS feed button, improved formatting and hopefully some advice from more senior Mac gurus.

Step 5: Logging your sessions

This step is an easy one, but essential. Stata has a log function that records all of the commands. This gives a running record of every command you entered, whether it was typed or done in the menus. Clearly, you may want to look back at descriptive statistics, regression diagnostics or other results to check on your work.

You can also add comments in your log, which is helpful when you read your log two months later and wonder why exactly


arima D.crime, ar(1) ma(1), vce(robust)

is different from

arima crime, arima(1,1,0) sarima(0,1,1),

and which one is better.

So you can add comments to explain this to your future self by typing the * before some explanation, like this:


arima D.crime, ar(1) ma(1), vce(robust)

* This command models a differenced time-series with first-order autoregression with robust standard error estimation

arima crime, arima(1,1,0) sarima(0,1,1)

* This command models a seasonally differenced series with a seasonal moving average and first-order autoregression.



The best way to read old logs is in the Stata viewer, because it highlights and formats everything quite naturally.

You can log in the command window by just typing:

log using FILENAME


However, because you need the full extension, it's usually easier to click the log button:

Stata menubar


Log button



Picture 2.jpg

Notice that you can append logs onto already existing files. This is handy if you want a complete running record of everything you did with a specific dataset.

I'd like to figure out how to make date-specific log files, but I don't have it working quite yet.

Incorporating the current date and time is explained more fully on this page from the Stata site.

Those are the basic five steps. Next we'll deal with 5 advanced Mac/Stata tips.

Step 4: Importing unformatted data

If you're lucky, you'll never have to deal with unformatted data. You'll just cruise to ICPSR or the local library and get your data in pre-formatted, machine-readable files with no errors, complete dictionary and label files, and lots of documentation.

That's exactly what I did for four years. Then I ran across an old, useful piece of data that had never before been put into a statistics package.

That's an exciting moment. While the General Social Survey has been combed over (and over and over) by social scientists, including me, it's nice to find a new dataset that hasn't been mined very heavily. Either that means it's a dud, or just that no one bothered with it and you may find something new.

Here's what the data looked like when I opened it:

0001124352540000000000000000000000000000000000000

000222223225202022222322120200001222232322520202223232212020

371141530500000000000000140010021803



Hmm.

No commas, no spaces, and no file to read all of this into Stata (or SPSS or anything else).

However, there was a paper codebook that explained what all of this meant. The first four digits are the ID; the next is the card number; the next is the opinions of Fraternal Order of Police members on the severity of illegal bookmaking on a 1-to-4 Likert scale. And so on.

infix dictionary {

* imports Fraternal Order of Police Study from 1975 Gambling Commission Study

2 lines

1:
ID 1-4


card 5


serbook 7


serlarc 8


sernumb 9


serpot 10


sercard 11


serfence 12


serhook 13



And we're off! By the way, "2 lines" is a Stata command indicating that a data records runs longer than a single line.

From here there are two ways to go:

First way:
  • Save the file as a .dct file, meaning it's a data dictionary
  • Use the infix menu in Stata (under File)
  • Specify the location of the raw data file and the dictionary file:

    Stata File Menu



    Picture 9.jpg



    Second way (preferred for me):

  • Paste the raw data into the file below the dictionary
  • Save the file as a .do file
  • Run it in Stata



    At this point, everything should be fine. Of course, you'll make mistakes the first few times, and need to re-run things.

    Next time I'll address Step 5: Logging your sessions.

  • Step 3: Choosing an External Text Editor

    Stata provides many ways for interacting with the program. You can use the drop down menus, of course. You can write your commands in the Command window. Or you can use Stata's built-in .do file editor, like so:




    Picture 2.jpg




    Picture 1.jpg The notepad button opens a new do file


    Picture 3.jpg





    However, I'll suggest that you use an external text editor instead of the built-in text editor.

    I'm not a snob about this. I still use the menus for many commands, especially new ones I'm not familiar with. However, eventually you'll have to write long files to:

  • Label (label var command)
  • Import (infix or infile command)
  • Define variable values (label var def command)
  • Summarize (summ/desc/codebook commands)

    Using a text editor is easier for the following reasons:

    1. Text editors make search-and-replace easier if you make a systematic mistake
    2. Text editors allow you to have Stata closed while writing
    3. Text editors (at least the ones outlined here) highlight syntax, which is invaluable for checking your code
    4. Text editors just have more options than the built-in editor (keyboard shortcuts, multiple views, ready-made templates)
    5. Text editors, as the name might suggest, make editing existing .do and .dct files much easier


    As far as available editors for Mac, there are many. Look around VersionTracker and you'll find all sorts of free or cheap options, like SubEthaEdit, TextWrangler, Smultron, Vim, and lots of others.

    If you're willing to pay a bit more, there are very full-featured programs like BBEdit ($49 US educational, $125 otherwise) and TextMate (39 Euros). Although these are impressive, they are geared much more towards professional web developers and are, in my opinion, a bit of overkill for Stata.

    I can't say I've tried all of these, of course. But I've experimented with a number of editors, including the built-in TextEdit, Taco Edit, Aquamacs, Smultron and TextWrangler.



    Unlike some built-in OS X software, TextEdit isn't too impressive. There are far better choices for free.


    Taco Edit is really designed for HTML coding, not other languages.


    Aquamacs is pretty good, some people really swear by it, including many of the skilled and serious programmers I know. The main objection I had was that it was much more difficult to integrate with Stata. You need to install a series of .ado files into Stata to make Aquamacs work as an external editor. Read this if you're interested.


    I haven't tried Vim.


    So we're down to two contenders:

    Smultron (open-source, freeware)

    Smultron icon



    Smultron is nice. It has a pleasant Cocoa interface. It handles multiple open files easily. It doesn't take much memory. Here's a screenshot:

    Picture 2.jpg


    TextWrangler (non-open-source, freeware)

    TextWrangler icon



    However, I'm going to recommend TextWrangler, the free version of BBEdit developed by Bare Bones Software. Here's what it looks like:

    TextWrangler

    Most importantly, TextWrangler integrates extremely easily with Stata:


    1. Save a file with a .do extension, and TextWrangler will immediately recognize it as a file that should be formatted according to Stata syntax and run in Stata.app
    2. The defaults can be set so new files are always formatted as .do files
    3. Once you've saved a .do file, you can quickly run it in Stata like this:


    Picture 5.jpg



    Click the button on the right and the file will pop up in the Finder window:


    Picture 7.jpg


    Double-click the file in Finder and it will immediately run in Stata; if Stata is closed, it will immediately open it and run it.

    And here's a really nice feature: at the bottom of TextWrangler, you can select your language and it will immediately recognize the command words. So if you need to do some website editing in CSS or HTML (as I sometimes do), TextWrangler is very helpful for that as well.

    Picture 4.jpg

    While you can't go wrong with either Smultron or TextWrangler, I'd recommend the latter as a free, full-featured, do-it-all program for Stata coding.

    Next installment will deal with Step 4: Importing Unformatted Data


    Technorati Tags:
    , , , , ,


  • Step 2: Setting up a profile.do file

    One thing you'll find in your "Getting Started in Stata" manual is a brief explanation of setting up a profile.do file. Briefly, this is a file that's run every time Stata is started. So if you always want to perform certain actions - start a log, set the memory, change the working directory - this is the place to do it.

    There are three steps to setting a profile.do file:

    1. Write a do-file
    2. Save it in the appropriate location (more on this in a moment)
    3. Restart Stata and check to Results window


    To write a do-file in Stata's built-in editor, go to the File menu:

    Stata file menu

    (I'll discuss using an external text editor in the next installment)

    You'll see a blank file like this:

    Blank do file

    Next we'll write a few commands. Remember these are things you want Stata to do every time you open it. Unless you exclusively work with one dataset, I wouldn't suggest opening it through the do-file. This is more "background": how much memory do you want allocated, where do you want the working directory, things that won't change often.

    Here's our simple do-file:

    Simple do file text

    It does three things:

    1. Sets the memory to 20m (the default is 1m). This can be done in the regular preferences pane also.
    2. Changes the working directory from "/Documents" to "/Documents/stata work/"
    3. Checks for updates to the executable and the .ado files.


    Now, most importantly, you need to save the .do file in the right location. It needs to reside in the Libraries/Application Support/Stata directory:

    Profile in directory

    Re-start Stata and you should see something like this:

    Picture 6.jpg

    Success, the profile.do file is running as it should.

    Next we'll deal with Step 3: Choosing an External Text Editor

    Monday, November 19, 2007

    Step 1: Installation and a Working Directory

    Let's set some preferences
    So you've secured your copy of Stata for Mac OS X. Installation is pretty simple and there isn't a lot to it. You insert the CD and follow the standard procedure with Mac OS X installation packages.

    One slight difference from other applications is that rather than installing in ~/Applications, Stata installs in ~Applications/Stata/. Not a lot to say about this. I have occasionally had a problem where I update the executable (more on this in the next installation) and it disappears from the dock temporarily. But otherwise everything works fine.

    What you should do next is create a reasonable working directory. I believe Stata starts you out in either /Documents or /Applications/Stata, neither of which is ideal. Here's mine at startup:

    Stata at startup


    In this installment, we'll:

    1. Create a specific "Stata work" folder, with subfolders for projects

    2. Change the working directory

    3. Change the defaults so the working directory is always "Stata work"


    Here's how:

  • If you're a mac user, you probably know how to create a directory. Go to "Documents" in the finder and select "New folder." Like this:

    Finder screenshot


  • There are three ways to change the working directory:

    1. Stata draws heavily from pre-windows command line (DOS and UNIX) structure; if you're comfortable with that, simply use the "cd" command from DOS (meaning "Change Directory") and type in the path. For example:


      cd "/Users/adamjacobs/Documents/Stata work/"







    2. However, most people will be more comfortable, and less error-prone, with the menu. Go to File ➡ Change Working Directory.

      Stata File menu

      * Note: the menubar is dark because I'm using a utility called MenuShade that dims the menubar while you're working. I recommend this for users who want to save their monitors and focus better while writing.

    3. If you prefer keyboard shortcuts, and I often do, the command is


      ⌘⇧J


      I'm not sure why it's J, something like C would make more sense.



    As for changing the defaults to always recognize this working directory, I'll cover that in the next step: Setting up a profile.do file.
  • Friday, November 16, 2007

    Step 0: Why Stata?

    I'm going to cover the basic steps of setting up stata on a Mac, including specifying the working directory, editing and placing the profile.do file, selecting an external text editor and logging your work. However, I thought I'd start with the larger question: why use Stata in the first place?

    Stata is hardly the only option in social science statistics software. At the very least, you could consider SAS, The R Project, Minitab, SPSS, S-Plus, MPlus, not to mention more specialized forms of modeling software like HLM (hierarchical linear modeling) and LISREL (structural equation modeling).

    However, for Mac users there are considerably fewer options:




    Here's the simple part: SAS, Minitab, HLM, Lisrel, MPlus, and S-Plus don't develop for OS X. Yes, I know there are Intel Macs that can run Windows, but that's not what this website is about. I am among those who still have an old PowerPC Mac; moreover, if I upgrade, I'd rather keep doing my data analysis in OS X.

    Many people have told me R is the future. It is free, open-source and offers incredibly robust graphics features. I believe it. However, R is largely a programming environment. It is not primarily a data analysis package. I've tried it once or twice but I just never felt comfortable. I'd suggest R for only tech-oriented people very comfortable with programming. If you still need drop-down menus and extensive help files to do anything in statistics (and I do), I wouldn't recommend R.

    This leaves us with the big players in the statistical software market: SPSS and Stata.




    Stata offers four different versions: Small, Intercooled, SE, and MP. This page from the Stata website outlines the differences.

  • Small Stata is for teaching environments only - you're limited to 99 variables and 1,000 observations.
  • Stata MP is for multi-processor environments, either newer dual-core computers or servers. With an older PowerPC laptop this was not an option. It's also rather unnecessary because it's largely designed for parallel processing.
  • Stata SE is the larger version of Intercooled. The main difference is that is allows ~32,000 (2^15) variables instead of ~2,000 (2^11) for Intercooled. SE also allows you to include up to 11,000 variables in a regression (!).
  • Intercooled Stata is what most people will be purchasing. It allows an unlimited number of observations, 2,047 variables and 800 variables in a regression. In practice, I've never had any limitations with Intercooled Stata. Even with large social science datasets, it's rare to deal with more than 2,000 variables; even the largest ICPSR datasets like the General Social Survey don't usually contain that many.

    All version of Stata include a full range of data analysis options. There are no add-ons or extra modules to purchase. This includes basic functions like descriptive statistics, graphics, and many forms of regression: linear, logistic, multinomial, poisson, GLM, ANOVA, ordered logistic and others. It also includes tools for time-series, event history analysis, multidimensional scaling, survey data, panel data, and robust standard error techniques like bootstrap and jackknife.

    Stata also archives many user-submitted programs; these can be accessed through a simple search in the main window. It's a quasi-open-source structure: the software is proprietary but people submit add-ons that are freely distributed.

    Intercooled Stata is available for $155 US with an educational discount. If you only want a one-year license (i.e. you're working on a limited-time project and know you won't need the software long-term) you can buy a one-year license for $95.

    Stata comes with a small manual, "Getting Started With Stata for Macintosh." It covers the basics like file management and help as well as how to interact with your data. It's short but quite useful. There is a much larger set of Stata manuals that you can purchase separately, but I haven't bothered.

    Stata offers a completely full featured package with excellent document for a very reasonable price. For comparison's sake, it's roughly the same cost as an upgrade to OS X 10.5. But what about the competition?




    SPSS is available for Mac, and it appears they now develop the versions concurrently for Mac/Windows/Linux instead of lagging the Mac versions behind. However, the real drawback of SPSS is simple: price.

    Let's take a look at the SPSS web store: thankfully I'll receive the higher education discount. That means the SPSS base system will only cost $619 US. Pretty high for a starting point. But wait, suppose I want to continue with an interrupted time-series analysis project I've been working on regarding crime rates in Atlantic City, NJ? I suppose I'd need SPSS Trends for another $519. Even worse, if I want basic maximum likelihood estimation models like logistic regression, I'll need to buy SPSS regression models for another $519. Remember this is with the educational discount and we're already talking about ~$1700 in software costs.

    As you can imagine, the pricing continues like this. Would you like generalized linear model options? You guessed it, another $519 for that module. How about correspondence analysis and multi-dimensional scaling? For that option, the bargain basement price is only $419.

    In spite of all this, I'm not completely opposed to SPSS. When I started out in graduate school and wrote my master's thesis, SPSS was a nice first option for dealing with logistic regression models and descriptive statistics. I've also used it in summer statistics workshops at ICPSR and been satisfied with its functionality. However, in both of those situations, the university had already paid to have all of the modules included; it wasn't until recently that I learned about the piece-by-piece pricing structure.

    If you are purchasing a statistics package for yourself, there's no way I can recommend SPSS. The price is just unreasonable, and I really dislike the idea of separate modules that you need to purchase in addition. The non-university prices are even more prohibitive: the base system starts at $1600!

    If you're still dubious, look at this MacStats review of Stata versus SPSS:

    There are some reasons why SPSS for the Mac is not a viable long-term option for many people, even though it’s easier to explore than most stats programs. Mac versions lag behind Windows versions, and the user interface has quirks, bugs, odd crashes and pauses, and problems working with other programs. The price is absurd, and on top of the excessive cost for the base package, most users will need extra modules, each of which costs about as much as Stata – and they charge for module upgrades, too. Finally, there might not even be another version of SPSS for the Mac, and if there is, it might not work with new or old computer. Even now, there is no version of SPSS for Intel Macs.





    Stata is the most full-featured, affordable and Mac-friendly statistical package available, and it's really not even close. Next post I'll discuss the basics of installation and setting up a working directory, with screenshots included.
  • Thursday, November 15, 2007

    The Raison D'Etre

    This blog covers Mac integration on Stata. That's all there is to it.

    I'm in the process of writing my dissertation on the growth of gambling over the past 30 years. One of the data sources I'm using is the 1975 Commission on the Review of the National Policy on Gambling. Unlike the more recent National Gambling Impact Study, the 1975 study is just in pure ASCII format. The data comes directly from the US National Archives and has not been processed like data at big archives like ICPSR.

    As a result, I've delved much deeper into the coding details of stata, including dictionary files, the labeling language and memory management. I'll relate my tips and experiences here.

    Stata is a wonderful program (see my review at Versiontracker) and the most Mac-friendly of all data analysis programs. The aim of this blog is to build a community of Mac Stata users and offer help to those just starting out.