Monday, July 13, 2015

Introduction to SAS for Stata users

You've learned lots of Stata. However, you need to learn SAS, possibly because it can handle large datasets efficiently, or possibly there are lots of SAS jobs (a search of LinkedIn at time of writing reveals there are 7339 SAS jobs). If this description sounds like you, don't fret! This blogpost is made just for you.

I assume that you have SAS installed and loaded. You should be seeing a screen like this:


In this tutorial, we will be learning five basic tasks:

  1. Opening a dataset
  2. Creating a "do-file"
  3. Generate summary statistics
  4. Generate correlation matrices
  5. List observations fulfilling certain criteria
Task #1: Opening a dataset

Download the auto dataset. Click the relevant buttons to import a CSV file. 

Click on File > Import Data


Choose the relevant commands to import the file. You can name the dataset as you wish. 

Assuming you've successfully imported the dataset, there will be new output in the log window, which should end with

NOTE: WORK.AUTO data set was successfully created.
NOTE: The data set WORK.AUTO has 74 observations and 12 variables.

Task #2: Creating a do-file

When you open Stata, you have to press Ctrl+9 to create a do-file. In SAS, a do-file window is automatically generated for you. See the Editor window (the bottom right)? That's the do-file, and you can start typing in it.

Task #3: Generate summary statistics 


Now, it's time to start learning some actual commands. Type the commands into the Editor window. Some pointers:

  • Comments start with * (these are in green)
  • Commands are in blue
    • These start with proc
    • And end with run
  • Don't forget semi-colons. The program will not run properly without it


* Stata's equivalent of tabulate;
proc freq data = auto;
tables rep78 foreign;
run;

* two way tabulation;
proc freq data = auto;
tables rep78*foreign;
*if you want percentages only;
* tables rep78*foreign / norow nocol nofreq;
run;
* Stata's equivalent of summarize;
proc means; * data = auto is optional, since you've already loaded auto;
var mpg rep78;
run;
* If you want even more detailed summary statistics;
proc univariate; * data = auto is optional, since you've already loaded auto;
var mpg rep78;
run;


Task #4: Generate correlation matrices 

* Stata's equivalent of corr;
proc corr;
run;
* just between certain variables;
proc corr;
var price weight;
run;
*suppress significance;
proc corr noprob;
run;

Task #5: List observations fulfilling certain criteria

* Stata's equivalent of list;
proc print data=auto;
run;

* to list observations fulfilling certain criteria;
proc print data=auto;
where price > 10000 & rep78; * excludes all observations which rep78 is missing, or rep78 == 0;
run;

No comments:

Post a Comment