SAS is a powerful statistical analysis software used by businesses and organizations all over the world. It is a versatile tool for various tasks, such as data mining, predictive modeling, and risk assessment. If you are preparing for a SAS interview, this guide provides questions along with detailed answers, ranging from basic to advanced. These questions cover a variety of topics, including the basics of SAS, data manipulation, statistical analysis, and machine learning.
SAS Interview Questions and Answers- Basic to Advanced
1. What is the difference between INPUT and INFILE in SAS?
2. Difference between Informat and Format in SAS?
3. What is the purpose of double trailing @@ in the INPUT statement?
4. How can you include or exclude specific variables in a data set?
5. How do you print observations 5 through 10 from a data set?
6. Difference between Missover and Truncover?
7. How does the Program Data Vector (PDV) work in SAS?
8. What is DATA NULL and its purpose?
9. What is the difference between Missover and Truncover in SAS?
10. Explain the default statistics produced by PROC MEANS.
11. Describe functions used for data cleaning in SAS.
12. What are the default statistics that PROC MEANS produce?
13. Explain functions you have used for data cleaning.
14. What is the difference between FUNCTION and PROC?
15. Differences between WHERE and IF statements?
16. What is Program Data Vector (PDV)?
17. What is DATA NULL?
18. What is the difference between the + operator and the SUM function?
19. How to identify and remove unique and duplicate values?
20. Difference between NODUP and NODUPKEY Options?
21. What are NUMERIC and CHARACTER, and what do they do?
22. How do you sort in descending order?
23. How to convert a numeric variable to a character variable?
24. How to convert a character variable to a numeric variable?25. Difference between VAR A1 – A3 and VAR A1 — A3?
26. Difference between PROC MEANS and PROC SUMMARY?27. How does the SUBSTR function work?
28. Difference between CEIL and FLOOR functions?
29. How to perform a Matched Merge with output only from both files?
30. How to label values in PROC FREQ?
31. How to use arrays to recode all numeric variables?
32. How to generate cross-tabulation?
33. How to calculate the mean for a variable by group?
34. What is the RETAIN statement used for?
35. What are SYMGET and SYMPUT?
36. How does PROC SQL handle merging two datasets?
37. How to debug SAS Macros?
1. What is the difference between INPUT and INFILE in SAS?
Answer: In SAS, INPUT
and INFILE
are both integral to reading raw data, but they have different roles.
- INFILE Statement:
INFILE
is used to specify the location of the external data file. It acts as a link between SAS and the external file, telling SAS where to locate the data it needs to process. It handles physical aspects, such as file path, line lengths, delimiters, and reading control options (e.g.,DLM=
,MISSOVER
). - INPUT Statement:
INPUT
is used to describe the layout of the data within the file and define how data is read from the file into SAS variables. It specifies which columns or fields from the raw data are assigned to which variables, allowing SAS to interpret each data item.
Example:
data mydata;
infile 'path_to_file.csv' dlm=',' missover;
input name $ age height weight;
run;
Here, INFILE
locates and connects to the external CSV file, while INPUT
reads values for name
, age
, height
, and weight
from the columns.
2. Difference between Informat and Format in SAS?
Answer: Informat
and Format
are used for handling data representation, but they serve different purposes.
- Informat: An
Informat
instructs SAS on how to read or interpret the data values in the raw data file. It is applied during the data reading phase, helping SAS convert raw data into internal data values. For instance,date9.
informat reads dates in the formatDDMMMYYYY
. - Format: A
Format
controls how data is displayed in output, without altering the actual values stored in the dataset. Formats can be used in procedures and with thePUT
statement.
Example:
data demo;
input date :date9.;
format date date9.;
datalines;
25DEC2022;
run;
Here, date9.
informat reads the date as 25DEC2022
, while date9.
format specifies that this date will appear as 25DEC2022
in output.
3. What is the purpose of double trailing @@
in the INPUT statement?
Answer:
The double trailing @@
in the INPUT
statement holds the current line in the buffer, allowing SAS to read multiple observations from the same line in a single pass. This is helpful when dealing with files where multiple records are present on the same line.
Example:
data test;
input name $ score @@;
datalines;
John 85 Alice 92 Bob 78 Sarah 88
;
run;
In this example, @@
keeps SAS from moving to the next line after each observation, allowing it to continue reading the data from the current line until it runs out of input data.
4. How can you include or exclude specific variables in a data set?
Answer:
Use DROP
or KEEP
options in the DATA
or SET
statements. Example: set dataset(drop=var1 var2);
excludes var1
and var2
.
5. How do you print observations 5 through 10 from a data set?
Answer:
Use FIRSTOBS=5
and OBS=10
options in PROC PRINT
. Example: proc print data=dataset(firstobs=5 obs=10); run;
.
6. Difference between Missover and Truncover?
Answer:
MISSOVER prevents SAS from skipping to the next line if it doesn’t find data for a variable, while TRUNCOVER reads as much data as is available and truncates if necessary.
7. How does the Program Data Vector (PDV) work in SAS?
Answer:
The PDV (Program Data Vector) is a logical area in memory where SAS builds data sets, one observation at a time. It is crucial for understanding how data is processed within the data step.
- Each variable in the data set is assigned a space in the PDV. When a new observation is read, SAS initializes numeric variables to missing (.) and character variables to blank.
- As SAS processes each line of code, it updates the PDV based on the data step instructions.
- After processing all instructions for the current observation, SAS writes the observation from the PDV to the data set. The PDV is then reinitialized for the next observation.
Example:
data example;
set old_data;
new_var = var1 * 2;
run;
In this example, old_data
is read into the PDV. SAS calculates new_var
, and then the complete observation is written to example
, reinitializing PDV for the next record.
8. What is DATA NULL and its purpose?
Answer: DATA _NULL_
is a special SAS data step that executes SAS code without creating an output data set. This step is used mainly when we want to perform operations or calculations, generate reports, or write information to external files without storing data.
- It saves resources by not storing any data in memory, making it efficient for tasks like logging, debugging, or performing computations where the output is not required as a dataset.
Example:
data _null_;
set sales;
file 'output.txt';
put name $ age;
run;
In this example, DATA _NULL_
reads sales
data but does not create a new dataset. Instead, it writes name
and age
values to an external file, output.txt
.
9. What is the difference between Missover and Truncover in SAS?
Answer: Both MISSOVER
and TRUNCOVER
are options in the INFILE
statement that control how SAS handles records that are shorter than expected.
- MISSOVER: If
MISSOVER
is used, SAS assigns missing values to variables when the input line does not contain enough data to read all variables. - TRUNCOVER:
TRUNCOVER
reads only the remaining characters for the variable if the data line is short, rather than assigning missing values. It prevents SAS from reading beyond the data line’s end, which is especially useful in fixed-width files.
Example:
data test;
infile 'data.txt' missover;
input name $ age salary;
run;
In this example, if data.txt
has records with missing values for age
or salary
, MISSOVER
assigns missing values rather than going to the next line or erroring out.
10. Explain the default statistics produced by PROC MEANS.
Answer: PROC MEANS
is a commonly used SAS procedure that provides summary statistics for numeric data. By default, it produces:
- N (Number of non-missing observations)
- Mean (Average of values)
- Std Dev (Standard Deviation)
- Min (Minimum value)
- Max (Maximum value)
Example:
proc means data=mydata;
var age height;
run;
This example produces the default statistics for age
and height
in the mydata
dataset. PROC MEANS
can also provide additional statistics like median
, sum
, and range
by specifying options.
11. Describe functions used for data cleaning in SAS.
Answer: SAS offers various functions for data cleaning, including:
- COMPRESS: Removes specific characters from strings.
- TRANSLATE: Replaces specific characters in a string.
- TRIM and STRIP: Remove leading or trailing spaces.
- SUBSTR: Allows modification of specific parts of a string.
- INTNX and INTCK: Used for date manipulation.
- UPCASE and LOWCASE: Standardize text case.
Example:
data clean;
set raw_data;
name = strip(upcase(name));
phone = compress(phone, '()- ');
run;
This example removes extra spaces from name
, converts it to uppercase, and removes specific characters from phone
.
Intermediate Questions
12. What are the default statistics that PROC MEANS produce?
Answer:
By default, PROC MEANS
outputs the N, Mean, Minimum, Maximum, and Standard Deviation.
13. Explain functions you have used for data cleaning.
Answer:
Some common functions for data cleaning include:
- COMPRESS: Removes specified characters.
- SCAN: Extracts words from strings.
- TRIM and LEFT: Removes leading and trailing blanks.
- IFN: Used to handle conditional statements.
14. What is the difference between FUNCTION and PROC?
Answer:
A FUNCTION performs operations on values and returns a result, typically used within a data step. A PROC (Procedure) is a pre-built SAS procedure used for analysis and processing data independently.
15. Differences between WHERE and IF statements?
Answer:
WHERE is applied during data reading and is more efficient for subsetting large data sets, while IF is applied within the data step after data is read.
16. What is Program Data Vector (PDV)?
Answer:
PDV is a memory area where SAS builds a dataset, holding one row of data at a time during processing.
17. What is DATA NULL?
Answer:
DATA NULL is used when you don’t need a data set output but want to execute code, such as writing to a log or generating macro variables.
18. What is the difference between the +
operator and the SUM
function?
Answer:
The +
operator returns missing values if any of the operands are missing, while SUM ignores missing values and only adds valid numbers.
19. How to identify and remove unique and duplicate values?
Answer:
To identify duplicates, use PROC SORT
with NODUPKEY
or NODUP
options. Use PROC FREQ
for unique counts.
20. Difference between NODUP and NODUPKEY Options?
Answer:
NODUP removes completely duplicate observations, while NODUPKEY removes duplicates based on specific key variables.
Advanced Questions
21. What are NUMERIC and CHARACTER, and what do they do?
Answer:
NUMERIC and CHARACTER are SAS keywords that reference all numeric and character variables in a dataset, respectively.
22. How do you sort in descending order?
Answer:
In PROC SORT
, use DESCENDING
before the variable name. Example: proc sort data=dataset; by descending var; run;
.
23. How to convert a numeric variable to a character variable?
Answer:
Use the PUT
function. Example: char_var = put(num_var, 8.);
24. How to convert a character variable to a numeric variable?
Answer:
Use the INPUT
function. Example: num_var = input(char_var, 8.);
25. Difference between VAR A1 - A3
and VAR A1 -- A3
?
Answer:
VAR A1 - A3
refers to variables in a sequence, while VAR A1 -- A3
includes all variables between A1
and A3
in the dataset.
26. Difference between PROC MEANS and PROC SUMMARY?
Answer:
PROC MEANS by default provides summary statistics; PROC SUMMARY allows more control and doesn’t print output unless requested.
27. How does the SUBSTR function work?
Answer:
SUBSTR
extracts a part of a string from a specified position. Example: substr(string, start, length);
28. Difference between CEIL and FLOOR functions?
Answer:
CEIL rounds up to the nearest integer, while FLOOR rounds down.
29. How to perform a Matched Merge with output only from both files?
Use IF (infile1 and infile2);
after merging.
Example:
data both;
merge file1(in=infile1) file2(in=infile2);
by id;
if infile1 and infile2;
run;
30. How to label values in PROC FREQ?
Answer:
Use LABEL
in the DATA
step, and labels will show in PROC FREQ
.
Example:
label var = 'Label';
proc freq data=dataset;
tables var;
run;
31. How to use arrays to recode all numeric variables?
Use _NUMERIC_
keyword in an array.
Example:
array nums _numeric_;
do i = 1 to dim(nums);
if nums[i] < 0 then nums[i] = 0;
end;
32. How to generate cross-tabulation?
Answer:
Use PROC FREQ
with TABLES
statement.
Example:
proc freq data=dataset;
tables var1*var2;
run;
33. How to calculate the mean for a variable by group?
Answer:
Use PROC MEANS
with BY
or CLASS
statements.
34. What is the RETAIN statement used for?
Answer:
RETAIN holds the values of variables across data step iterations, useful for cumulative totals or sequential processing.
35. What are SYMGET and SYMPUT?
Answer:
SYMGET retrieves the value of a macro variable in a data step, while SYMPUT assigns a value to a macro variable.
36. How does PROC SQL handle merging two datasets?
Answer:
Use the JOIN
clause with a WHERE
condition in PROC SQL
.
Example:
proc sql;
select * from dataset1 as d1 inner join dataset2 as d2 on d1.id = d2.id;
quit;
37. How to debug SAS Macros?
Answer:
Use MPRINT
, MLOGIC
, and SYMBOLGEN
options in the OPTIONS
statement to trace the macro execution path.
These questions cover key topics and provide a comprehensive understanding of SAS, from basic data manipulation to advanced procedures and debugging.
Learn More: Carrer Guidance [SAS Interview Questions and answers]
Palo Alto networks interview questions and answers
Snowflake interview questions and answers for experienced
Snowflake interview questions and answers for freshers
Azure data factory interview questions and answers