QUESTION 1
If your intention is to show trends over time, which chart type is the most appropriate way to depict the data?

A.    Line chart
B.    Bar chart
C.    Stacked bar chart
D.    Histogram

QUESTION 2
You are analyzing a time series and want to determine its stationarity. You also want to determine the order of autoregressive models.
How are the autocorrelation functions used?

A.    ACF as an indication of stationarity,and PACF for the correlation between Xt and Xt-k not explained
by their mutual correlation with X1 through Xk-1.
B.    PACF as an indication of stationarity,and ACF for the correlation between Xt and Xt-k not explained
by their mutual correlation with X1 through Xk-1.
C.    ACF as an indication of stationarity,and PACF to determine the correlation of X1 through Xk-1.
D.    PACF as an indication of stationarity,and ACF to determine the correlation of X1 through Xk-1.

QUESTION 3
Which word or phrase completes the statement? A spreadsheet is to a data island as a centralized database for reporting is to a ________?

A.    Data Warehouse
B.    Data Repository
C.    Analytic Sandbox
D.    Data Mart

QUESTION 4
What is one modeling or descriptive statistical function in MADlib that is typically not provided in a standard relational database?

A.    Linear regression
B.    Expected value
C.    Variance
D.    Quantiles

QUESTION 5
In which phase of the data analytics lifecycle do Data Scientists spend the most time in a project?

A.    Discovery
B.    Data Preparation
C.    Model Building
D.    Communicate Results

QUESTION 6
You are testing two new weight-gain formulas for puppies. The test gives the results:
Control group: 1% weight gain
Formula A. 3% weight gain
Formula B. 4% weight gain
A one-way ANOVA returns a p-value = 0.027
What can you conclude?

A.    Either Formula A or Formula B is effective at promoting weight gain.
B.    Formula B is more effective at promoting weight gain than Formula A.
C.    Formula A and Formula B are both effective at promoting weight gain.
D.    Formula A and Formula B are about equally effective at promoting weight gain.

QUESTION 7
Data visualization is used in the final presentation of an analytics project. For what else is this technique commonly used?

A.    Data exploration
B.    Descriptive statistics
C.    ETLT
D.    Model selection

QUESTION 8
Which functionality do regular expressions provide?

A.    text pattern matching
B.    underflow prevention
C.    increased numerical precision
D.    decreased processing complexity

QUESTION 9
When creating a project sponsor presentation, what is the main objective?

A.    Show that you met the project goals
B.    Show how you met the project goals
C.    Show how well the model will meet the SLA (service level agreement)
D.    Clearly describe the methods and techniques used

QUESTION 10
Which word or phrase completes the statement? Business Intelligence is to monitoring trends as Data Science is to ________ trends.

A.    Predicting
C.    Driving
D.    Optimizing

QUESTION 11
Consider a scale that has five (5) values that range from “not important” to “very important”. Which data classification best describes this data?

A.    Ordinal
B.    Nominal
C.    Real
D.    Ratio

QUESTION 12
Which key role for a successful analytic project can provide business domain expertise with a deep understanding of the data and key performance indicators?

B.    Project Manager

QUESTION 13
You have used k-means clustering to classify behavior of 100, 000 customers for a retail store. You decide to use household income, age, gender and yearly purchase amount as measures. You have chosen to use 8 clusters and notice that 2 clusters only have 3 customers assigned. What should you do?

A.    Decrease the number of clusters
B.    Increase the number of clusters
C.    Decrease the number of measures used

QUESTION 14
What does R code nv <- v[v < 1000] do?

A.    Selects the values in vector v that are less than 1000 and assigns them to the vector nv
B.    Sets nv to TRUE or FALSE depending on whether all elements of vector v are less than 1000
C.    Removes elements of vector v less than 1000 and assigns the elements >= 1000 to nv
D.    Selects values of vector v less than 1000,modifies v,and makes a copy to nv

QUESTION 15
For which class of problem is MapReduce most suitable?

A.    Embarrassingly parallel
B.    Minimal result data
D.    Non-overlapping queries

QUESTION 16
In data visualization, which type of chart is recommended to represent frequency data?

A.    Line chart
B.    Histogram
C.    Q-Q chart
D.    Scatterplot