web analytics

How should the data be represented?

A company’s social media manager requests more staff on the weekends to handle an increase in customer contacts from a particular region. The company needs a report to visualize the

Which strategy allows the appropriate level of access control and requires the LEAST amount of management work?

A solutions architect for a logistics organization ships packages from thousands of suppliers to end customers. The architect is building a platform where suppliers can view the status of one

Which Amazon Kinesis configuration meets these requirements?

A media advertising company handles a large number of real-time messages sourced from over 200 websites. The company’s data engineer needs to collect and process records in real time for

Which application on the cluster should the data engineer use?

There are thousands of text files on Amazon S3. The total size of the files is 1 PB. The files contain retail order information for the past 2 years. A

Which set of data processing steps improves recommendations for each region?

A company hosts a portfolio of e-commerce websites across the Oregon, N. Virginia, Ireland, and Sydney AWS regions. Each site keeps log files that capture user behavior. The company has

Which technology provides the most appropriate support for this requirements?

A company that provides economics data dashboards needs to be able to develop software to display rich, interactive, data-driven graphics that run in web browsers and leverages the full stack

Which solution should the architect build?

A solutions architect works for a company that has a data lake based on a central Amazon S3 bucket. The data contains sensitive information. The architect must be able to

Which AWS Lambda action is most appropriate?

A company with a support organization needs support engineers to be able to search historic cases to provide fast responses on new issues raised. The company has forwarded all support

On which basis should this binary classification model be built?

A company needs a churn prevention model to predict which customers will NOT renew their yearly subscription to the company’s service. The company plans to provide these customers with a

What is a low-cost way to create a unique log for each import job?

A company generates a large number of files each month and needs to use AWS import/export to move these files into Amazon S3 storage. To satisfy the auditors, the company

Which solution meets these requirements?

An organization is designing an application architecture. The application will have over 100 TB of data and will support transactions that arrive at rates from hundreds per second to tens

How should the administrator align instance types with the cluster’s purpose?

An administrator is deploying Spark on Amazon EMR for two distinct use cases: machine learning algorithms and ad-hoc querying. All data will be stored in Amazon S3. Two separate clusters

What is the most efficient technique to meet these requirements?

A company uses Amazon Redshift for its enterprise data warehouse. A new on-premises PostgreSQL OLTP DB must be integrated into the data warehouse. Each table in the PostgreSQL DB has

What is the most cost-effective solution for creating this visualization each day?

A clinical trial will rely on medical sensors to remotely assess patient health. Each physician who participates in the trial requires visual reports each morning. The reports are built from

What is the most cost- and time-efficient collection methodology in this situation?

A medical record filing system for a government medical fund is using an Amazon S3 bucket to archive documents related to patients. Every patient visit to a physician creates a

Which Amazon Machine Learning model is the most appropriate for the task?

An administrator tries to use the Amazon Machine Learning service to classify social media posts that mention the administrator’s company into posts that require a response and posts that do

How should this task be performed?

A city has been collecting data on its public bicycle share program for the past three years. The 5PB dataset currently resides on Amazon S3. The data contains the following

Which two additional pieces of information are required to determine the cause of the throttling? (Choose two.)

An online gaming company uses DynamoDB to store user activity logs and is experiencing throttled writes on the company’s DynamoDB table. The company is NOT consuming close to the provisioned

Which method meets the requirements?

A company is centralizing a large number of unencrypted small files from multiple Amazon S3 buckets. The company needs to verify that the files contain the same data after centralization.

Which approach should this customer use?

An enterprise customer is migrating to Redshift and is considering using dense storage nodes in its Redshift cluster. The customer wants to migrate 50 TB of data. The customer’s query

Which loading approach should the administrator use to meet this objective?

An administrator receives about 100 files per hour into Amazon S3 and will be loading the files into Amazon Redshift. Customers who analyze the data within Redshift gain significant value

What is the most cost-efficient option to meet these requirements?

A system needs to collect on-premises application spool files into a persistent storage layer in AWS. Each spool file is 2 KB. The application generates 1 M files per hour.

What is the optimal approach to meet these requirements?

A telecommunications company needs to predict customer churn (i.e., customers who decide to switch to a competitor). The company has historic records of each customer, including monthly consumption patterns, calls

Which solution should the data engineer choose?

The department of transportation for a major metropolitan area has placed sensors on roads at key locations around the city. The goal is to analyze the flow of traffic and

Which of the following techniques will meet this requirement most efficiently?

An organization uses Amazon Elastic MapReduce(EMR) to process a series of extract-transform-load (ETL) steps that run in sequence. The output of each step must be fully processed in subsequent steps

What is a possible solution for this problem?

An administrator is processing events in near real-time using Kinesis streams and Lambda. Lambda intermittently fails to process batches from one of the shards due to a 5-munite time limit.

What action should the organization take?

An organization uses a custom map reduce application to build monthly reports based on many small data files in an Amazon S3 bucket. The data is submitted from various business

What is the simplest architecture that will allow the architect to analyze the logs?

A company is building a new application in AWS. The architect needs to design a system to collect application log events. The design should be a repeatable pattern that minimizes

Which technique should be used to address this requirement with Amazon Redshift?

Managers in a company need access to the human resources database that runs on Amazon Redshift, to run reports about their employees. Managers must only see information about their direct

Which three steps should the data engineer take to accomplish this task? (Choose three.)

An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region.

How should the customer accomplish this?

A customer has a machine learning workflow that consists of multiple quick cycles of reads-writes-reads on Amazon S3. The customer needs to run the workflow on EMR but is concerned

What is the most cost-effective technique to meet these requirements?

An organization needs to design and deploy a large-scale data storage solution that will be highly durable and highly flexible with respect to the type and structure of data being

How should this be accomplished?

A system engineer for a company proposes digitalization and backup of large archives for customers. The systems engineer needs to provide users with a secure storage that makes sure that

What is the most cost-effective solution?

A travel website needs to present a graphical quantitative summary of its daily bookings to website visitors for marketing purposes. The website has millions of visitors per day, but wants

Which data store should the organization choose?

An organization needs a data store to handle the following data types and access patterns: Faceting Search Flexible schema (JSON) and fixed schema Noise word elimination Which data store should

Which ingestion solution should the company use?

A company that manufactures and sells smart air conditioning units also offers add-on services so that customers can see real-time dashboards in a mobile application or a web browser. Each

Which strategy will reduce the cost associated with the client’s read queries while not degrading quality?

An online retailer is using Amazon DynamoDB to store data related to customer transactions. The items in the table contains several string attributes describing the transaction as well as a

What is the most efficient method to query the data with Hive?

A customer is collecting clickstream data using Amazon Kinesis and is grouping the events by IP address into 5-minute chunks stored in Amazon S3. Many analysts in the company use

How should the company determine the most appropriate distribution key for the ORDERS table?

A customer needs to determine the optimal distribution strategy for the ORDERS fact table in its Redshift schema. The ORDERS table has foreign key relationships with multiple dimension tables in

Which strategy should be used to meet these requirements?

An online photo album app has a key design feature to support multiple screens (e.g, desktop, mobile phone, and tablet) with high-quality displays. Multiple versions of the image must be

Which approach should be used to accomplish this task?

An Amazon Kinesis stream needs to be encrypted. Which approach should be used to accomplish this task? A. Perform a client-side encryption of the data before it enters the Amazon

How should the data engineer make sure that the larger customer workloads do NOT interfere with the smaller customer workloads?

A data engineer is running a DWH on a 25-node Redshift cluster of a SaaS service. The data engineer needs to build a dashboard that will be used by customers.

What is the appropriate model choice and target attribute combination for this problem?

A company is using Amazon Machine Learning as part of a medical software application. The application will predict the most likely blood type for a patient based on a variety

What is a cost-effective way to provide near real-time alerts on the pipeline metrics?

A large oil and gas company needs to provide near real-time alerts when peak thresholds are exceeded in its pipeline system. The company has developed a system to capture pipeline

Which action should be taken prior to performing this upgrade task?

A data engineer is about to perform a major upgrade to the DDL contained within an Amazon Redshift cluster to support a new data warehouse application. The upgrade scripts will

How should the administrator accomplish this task?

A game company needs to properly scale its game application, which is backed by DynamoDB. Amazon Redshift has the past two years of historical data. Game traffic varies throughout the

Which technology is most appropriate to enable this capability?

An Amazon EMR cluster using EMRFS has access to petabytes of data on Amazon S3, originating from multiple unique data sources. The customer needs to query common fields across some

What is the most cost-effective solution to meet these requirements?

A social media customer has data from different data sources including RDS running MySQL, Redshift, and Hive on EMR. To support better analysis, the customer needs to be able to

Which action should the data engineer take to meet this requirement?

A data engineer wants to use an Amazon Elastic Map Reduce for an application. The data engineer needs to make sure it complies with regulatory requirements. The auditor must be

How should the administrator recommend storing the log data?

An administrator needs to design the event log storage architecture for events from mobile devices. The event data will be processed by an Amazon EMR cluster daily for aggregated reporting