This lab demonstrates how you can connect Microsoft Power BI tools (like Power BI Desktop) with Amazon Redshift and create visualizations on your data.
If you do not have a Remote Desktop client installed, you can use the instructions at this link to do so.
Below is the overview of the architecture and the steps involved in this lab.
This lab assumes you have launched a Redshift cluster, configured your client tool. If you have not configured your client tool, see Lab 1 - Creating Redshift Clusters : Configure Client Tool. For this lab, you will need to gather the following information about your cluster from LAB 1 - Creating Redshift Clusters.
Gather the following key pieces of information which are unique to your environment by navigating to the deployed cloudFormation stack.
Amazon Redshiftservice and then to
Clusters. You should be able to see the target Redshift cluster for this migration. Click the
propertiestab and then copy below attributes of this cluster.
Virtual private cloud (VPC)
VPC security group
Let’s create a Windows EC2 instance to install Power BI Desktop software for this lab, if you are running this workshop in an AWS-sponsored or AWS-staffed event, it is likely that you are given a temporary AWS account environment, in which the required lab resources have been pre-provisioned for you. In that case, please directly proceed to the next steps. Otherwise, use the following Cloud Formation template link.
Optional Open AWS console and then navigate to CloudFormation service. You will then be able to view the deployed stack.
Please deploy the stack in same AWS region where the Redshift cluster is running, provide
Redshift-VPC-SecurityGroup values collected in above step as input to the stack.
The template will use the default CIDR block of 0.0.0.0/0 which provides access from any IP Address. It is a best practice to replace this a range of IP addresses which should have access or allow only your own IP Address x.x.x.x/32.
To avoid paying for unused resources, please delete this cloudFormation stack after finishing this lab.
If you already completed Modernize w/ Spectrum lab, skip this section and proceed to next section Login to Power BI Desktop.
CREATE SCHEMA workshop_das; CREATE TABLE workshop_das.green_201601_csv ( vendorid VARCHAR(4), pickup_datetime TIMESTAMP, dropoff_datetime TIMESTAMP, store_and_fwd_flag VARCHAR(1), ratecode INT, pickup_longitude FLOAT4, pickup_latitude FLOAT4, dropoff_longitude FLOAT4, dropoff_latitude FLOAT4, passenger_count INT, trip_distance FLOAT4, fare_amount FLOAT4, extra FLOAT4, mta_tax FLOAT4, tip_amount FLOAT4, tolls_amount FLOAT4, ehail_fee FLOAT4, improvement_surcharge FLOAT4, total_amount FLOAT4, payment_type VARCHAR(4), trip_type VARCHAR(4) ) DISTSTYLE EVEN SORTKEY (passenger_count,pickup_datetime);
Make sure to replace
[Your-Redshift_Role_Arn] value in the script below.
COPY workshop_das.green_201601_csv FROM 's3://us-west-2.serverless-analytics/NYC-Pub/green/green_tripdata_2016-01.csv' IAM_ROLE '[Your-Redshift_Role_Arn]' DATEFORMAT 'auto' IGNOREHEADER 1 DELIMITER ',' IGNOREBLANKLINES REGION 'us-west-2' ;
select count(1) from workshop_das.green_201601_csv; --1445285
In this section we will login to the windows EC2 instance to launch Power BI Desktop software. Launch Windows Remote Desktop client (RDP) tool and enter below credentials to login to the instance.
[EC2-Client-Endpoint] - Navigate to
EC2 service and you could see an EC2 instance in running state. This is a Windows instance, pre-loaded with the Power BI Desktop software. Please copy the public DNS or IP address to login to that instance.
Remote Desktop Host: [EC2-Client-Endpoint] Username: developer Password: awesomeRs-2021
Find Microsoft Power BI Desktop icon on the desktop and double click to launch the software. Alternatively, you can search for “Power BI Desktop” in the Search box to launch it.
Once launched, select “Get data” link to start the integration process.
Get Data popup, search for
Redshift and select
Amazon Redshift in the results and click on
In the Redshift connection details screen, enter
Redshift cluster name,
port number and
database name as shown below and click on
Server: `[Redshift-Endpoint]` Database: dev
Enter the Redshift cluster credentials in below screen and click on
Connect button to proceed.
User name: awsuser Password: `[Supplied by your lab instructor]`
Once the connection is established successfully, Navigator page will list all available Redshift schemas to connect. Select
green_201601_csv table under
workshop_das schema and click on
Load button to create visualizations on this data.
Power BI Desktop will take few minutes to connect to the Redshift cluster and cache the green company taxi data locally.
Now it’s time to get creative and start creating visualizations on the green taxi data. We will create a map visualization with pickup latitude and longitude. Select
Map from the Visualizations palette.
Select pickup_latitude, pickup_longitude fields from the green_201601_csv tables.
In Visualizations Fields section, change the Latitude dropdown value from
Average of pickup_latitude to
Don’t Summarize, repeat the same for Longitude field.
Now, you will see the map populated with pickup points
Above steps are summarized below
This is just one example on how you can integrate Redshift data with Power BI Desktop, feel free to try other available visualizations.
If you are done using your cluster, please think about decommissioning it to avoid having to pay for unused resources.