Top 35+ SAS Interview Questions and answers- Basic to Advanced

SAS is a powerful statistical analysis software used by businesses and organizations all over the world. It is a versatile tool for various tasks, such as data mining, predictive modeling, and risk assessment. If you are preparing for a SAS interview, this guide provides questions along with detailed answers, ranging from basic to advanced. These questions cover a variety of topics, including the basics of SAS, data manipulation, statistical analysis, and machine learning.

SAS Interview Questions and Answers- Basic to Advanced
SAS Interview Questions and Answers- Basic to Advanced

SAS Interview Questions and Answers- Basic to Advanced

1. What is the difference between INPUT and INFILE in SAS?
2. Difference between Informat and Format in SAS?
3. What is the purpose of double trailing @@ in the INPUT statement?
4. How can you include or exclude specific variables in a data set?
5. How do you print observations 5 through 10 from a data set?
6. Difference between Missover and Truncover?
7. How does the Program Data Vector (PDV) work in SAS?
8. What is DATA NULL and its purpose?
9. What is the difference between Missover and Truncover in SAS?
10. Explain the default statistics produced by PROC MEANS.
11. Describe functions used for data cleaning in SAS.
12. What are the default statistics that PROC MEANS produce?
13. Explain functions you have used for data cleaning.
14. What is the difference between FUNCTION and PROC?
15. Differences between WHERE and IF statements?
16. What is Program Data Vector (PDV)?
17. What is DATA NULL?
18. What is the difference between the + operator and the SUM function?
19. How to identify and remove unique and duplicate values?
20. Difference between NODUP and NODUPKEY Options?
21. What are NUMERIC and CHARACTER, and what do they do?
22. How do you sort in descending order?
23. How to convert a numeric variable to a character variable?
24. How to convert a character variable to a numeric variable?25. Difference between VAR A1 – A3 and VAR A1 — A3?
26. Difference between PROC MEANS and PROC SUMMARY?27. How does the SUBSTR function work?
28. Difference between CEIL and FLOOR functions?
29. How to perform a Matched Merge with output only from both files?
30. How to label values in PROC FREQ?
31. How to use arrays to recode all numeric variables?
32. How to generate cross-tabulation?
33. How to calculate the mean for a variable by group?
34. What is the RETAIN statement used for?
35. What are SYMGET and SYMPUT?
36. How does PROC SQL handle merging two datasets?
37. How to debug SAS Macros?

1. What is the difference between INPUT and INFILE in SAS?

Answer: In SAS, INPUT and INFILE are both integral to reading raw data, but they have different roles.

  • INFILE Statement: INFILE is used to specify the location of the external data file. It acts as a link between SAS and the external file, telling SAS where to locate the data it needs to process. It handles physical aspects, such as file path, line lengths, delimiters, and reading control options (e.g., DLM=, MISSOVER).
  • INPUT Statement: INPUT is used to describe the layout of the data within the file and define how data is read from the file into SAS variables. It specifies which columns or fields from the raw data are assigned to which variables, allowing SAS to interpret each data item.

Example:

Here, INFILE locates and connects to the external CSV file, while INPUT reads values for name, age, height, and weight from the columns.

2. Difference between Informat and Format in SAS?

Answer: Informat and Format are used for handling data representation, but they serve different purposes.

  • Informat: An Informat instructs SAS on how to read or interpret the data values in the raw data file. It is applied during the data reading phase, helping SAS convert raw data into internal data values. For instance, date9. informat reads dates in the format DDMMMYYYY.
  • Format: A Format controls how data is displayed in output, without altering the actual values stored in the dataset. Formats can be used in procedures and with the PUT statement.

Example:

Here, date9. informat reads the date as 25DEC2022, while date9. format specifies that this date will appear as 25DEC2022 in output.

3. What is the purpose of double trailing @@ in the INPUT statement?

Answer:

The double trailing @@ in the INPUT statement holds the current line in the buffer, allowing SAS to read multiple observations from the same line in a single pass. This is helpful when dealing with files where multiple records are present on the same line.

Example:

In this example, @@ keeps SAS from moving to the next line after each observation, allowing it to continue reading the data from the current line until it runs out of input data.

4. How can you include or exclude specific variables in a data set?

Answer:

Use DROP or KEEP options in the DATA or SET statements. Example: set dataset(drop=var1 var2); excludes var1 and var2.

5. How do you print observations 5 through 10 from a data set?

Answer:

Use FIRSTOBS=5 and OBS=10 options in PROC PRINT. Example: proc print data=dataset(firstobs=5 obs=10); run;.

6. Difference between Missover and Truncover?

Answer:

MISSOVER prevents SAS from skipping to the next line if it doesn’t find data for a variable, while TRUNCOVER reads as much data as is available and truncates if necessary.

7. How does the Program Data Vector (PDV) work in SAS?

Answer:

The PDV (Program Data Vector) is a logical area in memory where SAS builds data sets, one observation at a time. It is crucial for understanding how data is processed within the data step.

  • Each variable in the data set is assigned a space in the PDV. When a new observation is read, SAS initializes numeric variables to missing (.) and character variables to blank.
  • As SAS processes each line of code, it updates the PDV based on the data step instructions.
  • After processing all instructions for the current observation, SAS writes the observation from the PDV to the data set. The PDV is then reinitialized for the next observation.

Example:

In this example, old_data is read into the PDV. SAS calculates new_var, and then the complete observation is written to example, reinitializing PDV for the next record.

8. What is DATA NULL and its purpose?

Answer: DATA _NULL_ is a special SAS data step that executes SAS code without creating an output data set. This step is used mainly when we want to perform operations or calculations, generate reports, or write information to external files without storing data.

  • It saves resources by not storing any data in memory, making it efficient for tasks like logging, debugging, or performing computations where the output is not required as a dataset.

Example:

In this example, DATA _NULL_ reads sales data but does not create a new dataset. Instead, it writes name and age values to an external file, output.txt.

9. What is the difference between Missover and Truncover in SAS?

Answer: Both MISSOVER and TRUNCOVER are options in the INFILE statement that control how SAS handles records that are shorter than expected.

  • MISSOVER: If MISSOVER is used, SAS assigns missing values to variables when the input line does not contain enough data to read all variables.
  • TRUNCOVER: TRUNCOVER reads only the remaining characters for the variable if the data line is short, rather than assigning missing values. It prevents SAS from reading beyond the data line’s end, which is especially useful in fixed-width files.

Example:

In this example, if data.txt has records with missing values for age or salary, MISSOVER assigns missing values rather than going to the next line or erroring out.

10. Explain the default statistics produced by PROC MEANS.

Answer: PROC MEANS is a commonly used SAS procedure that provides summary statistics for numeric data. By default, it produces:

  • N (Number of non-missing observations)
  • Mean (Average of values)
  • Std Dev (Standard Deviation)
  • Min (Minimum value)
  • Max (Maximum value)

Example:

This example produces the default statistics for age and height in the mydata dataset. PROC MEANS can also provide additional statistics like median, sum, and range by specifying options.

11. Describe functions used for data cleaning in SAS.

Answer: SAS offers various functions for data cleaning, including:

  • COMPRESS: Removes specific characters from strings.
  • TRANSLATE: Replaces specific characters in a string.
  • TRIM and STRIP: Remove leading or trailing spaces.
  • SUBSTR: Allows modification of specific parts of a string.
  • INTNX and INTCK: Used for date manipulation.
  • UPCASE and LOWCASE: Standardize text case.

Example:

This example removes extra spaces from name, converts it to uppercase, and removes specific characters from phone.

Intermediate Questions

12. What are the default statistics that PROC MEANS produce?

Answer:

By default, PROC MEANS outputs the N, Mean, Minimum, Maximum, and Standard Deviation.

13. Explain functions you have used for data cleaning.

Answer:

Some common functions for data cleaning include:

  • COMPRESS: Removes specified characters.
  • SCAN: Extracts words from strings.
  • TRIM and LEFT: Removes leading and trailing blanks.
  • IFN: Used to handle conditional statements.

14. What is the difference between FUNCTION and PROC?

Answer:

A FUNCTION performs operations on values and returns a result, typically used within a data step. A PROC (Procedure) is a pre-built SAS procedure used for analysis and processing data independently.

15. Differences between WHERE and IF statements?

Answer:

WHERE is applied during data reading and is more efficient for subsetting large data sets, while IF is applied within the data step after data is read.

16. What is Program Data Vector (PDV)?

Answer:

PDV is a memory area where SAS builds a dataset, holding one row of data at a time during processing.

17. What is DATA NULL?

Answer:

DATA NULL is used when you don’t need a data set output but want to execute code, such as writing to a log or generating macro variables.

18. What is the difference between the + operator and the SUM function?

Answer:

The + operator returns missing values if any of the operands are missing, while SUM ignores missing values and only adds valid numbers.

19. How to identify and remove unique and duplicate values?

Answer:

To identify duplicates, use PROC SORT with NODUPKEY or NODUP options. Use PROC FREQ for unique counts.

20. Difference between NODUP and NODUPKEY Options?

Answer:

NODUP removes completely duplicate observations, while NODUPKEY removes duplicates based on specific key variables.

Advanced Questions

21. What are NUMERIC and CHARACTER, and what do they do?

Answer:

NUMERIC and CHARACTER are SAS keywords that reference all numeric and character variables in a dataset, respectively.

22. How do you sort in descending order?

Answer:

In PROC SORT, use DESCENDING before the variable name. Example: proc sort data=dataset; by descending var; run;.

23. How to convert a numeric variable to a character variable?

Answer:

Use the PUT function. Example: char_var = put(num_var, 8.);

24. How to convert a character variable to a numeric variable?

Answer:

Use the INPUT function. Example: num_var = input(char_var, 8.);

25. Difference between VAR A1 - A3 and VAR A1 -- A3?

Answer:

VAR A1 - A3 refers to variables in a sequence, while VAR A1 -- A3 includes all variables between A1 and A3 in the dataset.

26. Difference between PROC MEANS and PROC SUMMARY?

Answer:

PROC MEANS by default provides summary statistics; PROC SUMMARY allows more control and doesn’t print output unless requested.

27. How does the SUBSTR function work?

Answer:

SUBSTR extracts a part of a string from a specified position. Example: substr(string, start, length);

28. Difference between CEIL and FLOOR functions?

Answer:

CEIL rounds up to the nearest integer, while FLOOR rounds down.

29. How to perform a Matched Merge with output only from both files?

Use IF (infile1 and infile2); after merging.

Example:

30. How to label values in PROC FREQ?

Answer:

Use LABEL in the DATA step, and labels will show in PROC FREQ.

Example:

31. How to use arrays to recode all numeric variables?

Use _NUMERIC_ keyword in an array.

Example:

32. How to generate cross-tabulation?

Answer:

Use PROC FREQ with TABLES statement.

Example:

33. How to calculate the mean for a variable by group?

Answer:

Use PROC MEANS with BY or CLASS statements.

34. What is the RETAIN statement used for?

Answer:

RETAIN holds the values of variables across data step iterations, useful for cumulative totals or sequential processing.

35. What are SYMGET and SYMPUT?

Answer:

SYMGET retrieves the value of a macro variable in a data step, while SYMPUT assigns a value to a macro variable.

36. How does PROC SQL handle merging two datasets?

Answer:

Use the JOIN clause with a WHERE condition in PROC SQL.

Example:

37. How to debug SAS Macros?

Answer:

Use MPRINT, MLOGIC, and SYMBOLGEN options in the OPTIONS statement to trace the macro execution path.

These questions cover key topics and provide a comprehensive understanding of SAS, from basic data manipulation to advanced procedures and debugging.

Learn More: Carrer Guidance [SAS Interview Questions and answers]

Palo Alto networks interview questions and answers

Snowflake interview questions and answers for experienced

Snowflake interview questions and answers for freshers

Azure data factory interview questions and answers

LWC scenario based Interview Questions experienced

ETL testing interview questions and answers for experienced

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

    Comments