Friday, January 4, 2008

SAS based datamining - the first macro - Knowing data about your dataset

An important early step in developing SAS based data mining codes is the following:

/* Separate out categorical variables (nominal vs. ordinal) vs. continous variables, know their numbers, and save them as part of list of macro variables; this intelligence is in proc contents, captured in SAS meta data when we create any SAS dataset; the output of proc contents of to_be_mined_dataset is outfile_contents.sas7bdat.

cvars has the list of discrete variables as a data element separate by space
nvars has the list of continous variables as a data element separated by space

This macro also points out to

- the power of proc sql and its INTO: option for creating macro variables out of sas meta data
- the power of SAS meta data

*/


proc sql noprint;
select name into : cvars separated by ' '
from outfile_contents
where type=2;
select count(distinct name) into : num_cvars separated by ' '
from outfile_contents
where type=2;
select name into : nvars separated by ' '
from outfile_contents
where type=1;
select count(distinct name) into : num_nvars separated by ' '
from outfile_contents
where type=1;
quit;

/* Exercise: for checking out the cvars or nvars in the screen */

What is the right put command?

No comments: