Usage analytics
- The plugin Stream Firestore to BigQuery streams every change in the firestore DB into a BigQuery table (including the user e-mails which are personal data, but the feedback itself is encrypted)
- A Cloud function runs a query that aggregates data into another dataset called
feedzback_usage
every day, removing personal data. Using a service accountanalytics-editor
that has access to all data in BQ - A Looker Studio report query the tables from
feedzback_usage
using a service-accountanalytics-viewer
which has access only tofeedzback_usage
and does not have access to personal data.
note
This document describes the installation instructions.
Prerequisites
Be an owner of the project.
Installation
- In Cloud Shell, find the zone of your Firestore database.
gcloud firestore databases list
-
Activate the plugin "Stream Firestore to BigQuery" in your project
- This will enable the APIs
- BigQuery API
- Cloud Tasks API
- Eventarc API
- In the Configuration step, configure the extension as follow
- Cloud functions location : Same as your firestore location (or closest if not available)
- BigQuery dataset location : Same as your firestore location (or closest if not available). Remember this as your
$ANALYTICS_GCP_ZONE
- Collection path : feedback
- Dataset ID : firestore_export
- Table ID : feedback
- Import existing Firestore documents into BigQuery ? : Yes (If you forgot to check it use this)
- Existing documents collection : feedback
- Leave other parameters as default, do not check
Enable events
- This will enable the APIs
-
In Circle CI, go to the /Organization settings/Context page of your project and add the environment variable $ANALYTICS_GCP_ZONE .
-
In Cloud Shell, tell the shell where is your analytics zone
# The zone of the existing firestore db. Due to a misconfiguration it is in Montreal for the dev environment.
export ANALYTICS_GCP_ZONE="<zone found in previous step>"
- In Cloud Shell, Allow CircleCI to deploy Cloud functions. Every change in the function will be deployed the same way as the rest of the codebase.
gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member="serviceAccount:circleci@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com" --role="roles/cloudfunctions.developer"
gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} \
--member="serviceAccount:circleci@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com" \
--role="roles/iam.serviceAccountUser"
- Wait for the
firestore_export
dataset to be created by the extension - In Cloud Shell, create the service accounts and the bigquery dataset
# Create feedzback_usage that will only store non-personal data
bq --location=$ANALYTICS_GCP_ZONE mk --dataset ${GOOGLE_CLOUD_PROJECT}:feedzback_usage
gcloud iam service-accounts create analytics-editor --display-name="Service account to read or write analytics based on the firestore export"
gcloud iam service-accounts create analytics-viewer --display-name="Service account dedicated to looker studio to allow it to read only feedzback_usage"
# Allow analytics-editor to read and write on the firestore_export. It can be done in the web console or using the following lines
bq show --format=prettyjson ${GOOGLE_CLOUD_PROJECT}:firestore_export > /tmp/firestore_export.json
jq '.access += [{"role" : "READER", "userByEmail" : "analytics-editor@'${GOOGLE_CLOUD_PROJECT}'.iam.gserviceaccount.com"},{"role" : "WRITER", "userByEmail" : "analytics-editor@'${GOOGLE_CLOUD_PROJECT}'.iam.gserviceaccount.com"} ]' /tmp/firestore_export.json > /tmp/firestore_export_updated.json
bq update --source /tmp/firestore_export_updated.json firestore_export
# Allow analytics-editor to use BQ
gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member="serviceAccount:analytics-editor@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com" --role="roles/bigquery.user"
# Allow analytics-viewer to create queries in BQ
gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member="serviceAccount:analytics-viewer@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com" --role="roles/bigquery.user"
# Modify feedzback_usage so it is owned by analytics-editor and readable by analytics-viewer
bq show --format=prettyjson ${GOOGLE_CLOUD_PROJECT}:feedzback_usage > /tmp/feedzback_usage.json
jq '.access += [{"role" : "READER", "userByEmail" : "analytics-viewer@'${GOOGLE_CLOUD_PROJECT}'.iam.gserviceaccount.com"},{"role" : "OWNER", "userByEmail" : "analytics-editor@'${GOOGLE_CLOUD_PROJECT}'.iam.gserviceaccount.com"}]' /tmp/feedzback_usage.json > /tmp/feedzback_usage_updated.json
bq update --source /tmp/feedzback_usage_updated.json feedzback_usage
- On your computer Create the tag for your revision and push it. The CI should deploy the cloud function
git tag <your tag e.g. dev-1.2.3>
git push --tags
- In Cloud Shell, configure Cloud Scheduler for a daily export. If it does not work make sure circle-ci has deployed the cloud function
# If asked to enable cloudscheduler API, say yes
gcloud scheduler jobs create http daily_usage_export \
--location=${ANALYTICS_GCP_ZONE} \
--schedule='0 0 * * *' \
--uri "https://${ANALYTICS_GCP_ZONE}-${GOOGLE_CLOUD_PROJECT}.cloudfunctions.net/create-analytics" \
--http-method=POST \
--oidc-service-account-email="analytics-editor@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com" \
--oidc-token-audience="https://${ANALYTICS_GCP_ZONE}-${GOOGLE_CLOUD_PROJECT}.cloudfunctions.net/create-analytics"
- Wait for the CI to have deployed the cloud function
- In Cloud Shell, give the analytics-editor service account the rights to invoke cloud function. Then run it once once to initialize the database
gcloud functions add-invoker-policy-binding create-analytics --member="serviceAccount:analytics-editor@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com" --region="${ANALYTICS_GCP_ZONE}"
gcloud functions call create-analytics --gen2 --region=${ANALYTICS_GCP_ZONE}
- In Cloud Shell, grant looker studio the right to use service accounts to retrieve data
# NB : the value of member can be found here : https://lookerstudio.google.com/serviceAgentHelp
gcloud projects add-iam-policy-binding ${GOOGLE_CLOUD_PROJECT} --member="serviceAccount:service-org-506755999458@gcp-sa-datastudio.iam.gserviceaccount.com" --role="roles/iam.serviceAccountTokenCreator"
- Modify the looker studio report to include your analysis. Make sure each datasource uses the service account
analytics-viewer@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com
. By default it uses your google account which has the owner rights.