Friday, July 17, 2015

Converting data from wide to long (PROC TRANSPOSE)

Longitudinal data (also called panel data/time-series cross section data) are getting increasing popular. How can one handle them effectively? This post deals with one crucial aspect: converting data from "long" format to "wide" format. If you aren't familiar with long/wide, take a look at this page.

First, let's generate a dataset from scratch:

/* create a sample data set named test */
data test;
input china russia usa; /* specifies variable names */
datalines;
1 2 3
4 5 6
7 8 9
;
run;

You should have this table.




But we aren't done with data generation yet. Let's continue creating more variables.

proc means noprint; *generates means, does not print output;
var _numeric_; /*specifies all numeric variables in the current DATA step, which means, all variables*/
output out=new(drop=_type_ _freq_) mean= median= std= q1= q3= / autoname;
/* drop = _type_ _freq_ means you drop the two variables so named*/
/* additional variables generated are mean, median, std, q1, q3*/
run;

/* Transposes the dataset */
proc transpose data=new out=out;
run;

Now we have artificially created all data.



However, there are two issues:
(1) it is in "long" format. We want it in "wide" format.
(2) ideally, we don't want a single variable to take on values such as "china_Mean". We'd prefer to have two variables, the first taking on "China", "Russia", "USA", and the second taking "mean", "median", etc.

Let's deal with issue 2 first. We use the scan function and indicate (with '_') that _ is the separator. _name_ means we'll take any word, and 1 means the first instance, 2 means the second instance.

/* Separates the variables */

data separated;
set out;
varname=scan(_name_,1,'_');
stat=scan(_name_,2,'_');
drop _name_;
run;

proc sort data=separated;
by varname;
run;

The last three lines sort the data.

Next, we need to transpose the data.

proc transpose data=separated out=transposed(drop=_name_);
by varname;
id stat;
var col1;
run;

proc print data=transposed;
title 'Transposed Data!';
run;

And you can admire the finished product (for a little while, before going to learn more SAS).


Click here to download all the code of today's post.

No comments:

Post a Comment