Power BI Lab

This lab demonstrates how you can connect Microsoft Power BI tools (like Power BI Desktop) with Amazon Redshift and create visualizations on your data.

Contents

Prerequisites

If you do not have a Remote Desktop client installed, you can use the instructions at this link to do so.

Architecture

Below is the overview of the architecture and the steps involved in this lab.

Before You Begin

This lab assumes you have launched a Redshift cluster, configured your client tool. If you have not configured your client tool, see Lab 1 - Creating Redshift Clusters : Configure Client Tool. For this lab, you will need to gather the following information about your cluster from LAB 1 - Creating Redshift Clusters.

  • [Your-Redshift_Role_Arn]

Gather the following key pieces of information which are unique to your environment by navigating to the deployed cloudFormation stack.

  • Navigate to Amazon Redshift service and then to Clusters. You should be able to see the target Redshift cluster for this migration. Click the properties tab and then copy below attributes of this cluster.
    • [Redshift-Endpoint] - Endpoint
    • [Redshift-VPC-Id] - Virtual private cloud (VPC)
    • [Redshift-VPC-SecurityGroup] - VPC security group

Environment Setup

Let’s create a Windows EC2 instance to install Power BI Desktop software for this lab, if you are running this workshop in an AWS-sponsored or AWS-staffed event, it is likely that you are given a temporary AWS account environment, in which the required lab resources have been pre-provisioned for you. In that case, please directly proceed to the next steps. Otherwise, use the following Cloud Formation template link.

Optional Open AWS console and then navigate to CloudFormation service. You will then be able to view the deployed stack.

Please deploy the stack in same AWS region where the Redshift cluster is running, provide Redshift-VPC-Id, Redshift-VPC-SecurityGroup values collected in above step as input to the stack.

The template will use the default CIDR block of 0.0.0.0/0 which provides access from any IP Address. It is a best practice to replace this a range of IP addresses which should have access or allow only your own IP Address x.x.x.x/32.

To avoid paying for unused resources, please delete this cloudFormation stack after finishing this lab.

Prepare the Data

If you already completed Modernize w/ Spectrum lab, skip this section and proceed to next section Login to Power BI Desktop.

Build your DDL

CREATE SCHEMA workshop_das;

CREATE TABLE workshop_das.green_201601_csv
(
  vendorid                VARCHAR(4),
  pickup_datetime         TIMESTAMP,
  dropoff_datetime        TIMESTAMP,
  store_and_fwd_flag      VARCHAR(1),
  ratecode                INT,
  pickup_longitude        FLOAT4,
  pickup_latitude         FLOAT4,
  dropoff_longitude       FLOAT4,
  dropoff_latitude        FLOAT4,
  passenger_count         INT,
  trip_distance           FLOAT4,
  fare_amount             FLOAT4,
  extra                   FLOAT4,
  mta_tax                 FLOAT4,
  tip_amount              FLOAT4,
  tolls_amount            FLOAT4,
  ehail_fee               FLOAT4,
  improvement_surcharge   FLOAT4,
  total_amount            FLOAT4,
  payment_type            VARCHAR(4),
  trip_type               VARCHAR(4)
)
DISTSTYLE EVEN
SORTKEY (passenger_count,pickup_datetime);

Build your Copy Command

  • Build your copy command to copy the data from Amazon S3. This dataset has the number of taxi rides in the month of January 2016.

Make sure to replace [Your-Redshift_Role_Arn] value in the script below.

COPY workshop_das.green_201601_csv
FROM 's3://us-west-2.serverless-analytics/NYC-Pub/green/green_tripdata_2016-01.csv'
IAM_ROLE '[Your-Redshift_Role_Arn]'
DATEFORMAT 'auto'
IGNOREHEADER 1
DELIMITER ','
IGNOREBLANKLINES
REGION 'us-west-2'
;
  • Determine how many rows you just loaded.
select count(1) from workshop_das.green_201601_csv;
--1445285

Login to Power BI Desktop

In this section we will login to the windows EC2 instance to launch Power BI Desktop software. Launch Windows Remote Desktop client (RDP) tool and enter below credentials to login to the instance.

[EC2-Client-Endpoint] - Navigate to EC2 service and you could see an EC2 instance in running state. This is a Windows instance, pre-loaded with the Power BI Desktop software. Please copy the public DNS or IP address to login to that instance.

Remote Desktop Host: [EC2-Client-Endpoint]
Username: developer
Password: awesomeRs-2021

Find Microsoft Power BI Desktop icon on the desktop and double click to launch the software. Alternatively, you can search for “Power BI Desktop” in the Search box to launch it.

Connecting to Amazon Redshift

Once launched, select “Get data” link to start the integration process.

In the Get Data popup, search for Redshift and select Amazon Redshift in the results and click on Connect button.

In the Redshift connection details screen, enter Redshift cluster name, port number and database name as shown below and click on OK

Server: `[Redshift-Endpoint]`
Database: dev

Enter the Redshift cluster credentials in below screen and click on Connect button to proceed.

User name: awsuser
Password: `[Supplied by your lab instructor]`

Once the connection is established successfully, Navigator page will list all available Redshift schemas to connect. Select green_201601_csv table under workshop_das schema and click on Load button to create visualizations on this data.

Power BI Desktop will take few minutes to connect to the Redshift cluster and cache the green company taxi data locally.

Create Visualizations

Now it’s time to get creative and start creating visualizations on the green taxi data. We will create a map visualization with pickup latitude and longitude. Select Map from the Visualizations palette.

Select pickup_latitude, pickup_longitude fields from the green_201601_csv tables.

In Visualizations Fields section, change the Latitude dropdown value from Average of pickup_latitude to Don’t Summarize, repeat the same for Longitude field.

Now, you will see the map populated with pickup points

Above steps are summarized below

This is just one example on how you can integrate Redshift data with Power BI Desktop, feel free to try other available visualizations.

Before You Leave

If you are done using your cluster, please think about decommissioning it to avoid having to pay for unused resources.

Feedback/Issue

We would love to hear from you

Please let us know if there's a way we can make this lab better, if there is a lab you would like added, or if you want help getting started with your Redshift deployment.

Submit