close
close

Cloud Resume Challenge Part 3: Exploring CosmoDB and Database Security

Cloud Resume Challenge Part 3: Exploring CosmoDB and Database Security

I invite you to check out my CV at resume.allmark.me!
I appreciate every opinion, good or bad 😄

Progress so far:

  1. Certification ✔️
  2. HTML✔️
  3. CSS✔️
  4. Static website ✔️
  5. HTTPS Protocol ✔️
  6. DNS server ✔️
  7. JavaScript 🚧
  8. Database 🚧
  9. API 🚧
  10. Python 🚧
  11. Python Testing ❌
  12. Infrastructure as Code 🚧
  13. Source Control ❌
  14. CI/CD Backend ❌
  15. Frontend CI/CD ❌
  16. Blog post 🚧

In short
A user is working on a project where he wants to track visitors to his site. He started with a simple visitor counter using localStorage, but that didn’t meet his needs. He decided to use CosmoDB for his database and created a Bicep script to implement it. He also created a function app to interact with the database.

However, they ran into difficulties securing their Function App and API keys. After researching various solutions, they decided to migrate their site from storage accounts to a real static web app. They noticed that Azure Static Webapps had a preview feature that allowed them to connect directly to an Azure database with built-in security, which would simplify their project. They also set up a pipeline between their GitHub and their static web app so that a commit to their repository would trigger a GitHub action workflow that deploys the code to their site.

They spent most of their time reading the documentation and troubleshooting, and plan to detail their process in a future blog post. They also plan to find another way to incorporate Python into their project.

Since the complexity of this project is growing rapidly, I decided to draw a map before continuing. While mapping, I also decided that I wanted to receive an email every time someone visited my website, what their IP address was, and how many times they visited.
Map
To get started with the run, I decided to start with a hit counter that was as simple as possible and then add complexity. My first draft looked like this:
counter
It stores the information via a localStorage object. This means that the counter is browser-specific and would be cleared if the user cleared their browser data. While it doesn’t meet the criteria, the visualizations gave me an idea of ​​what to configure next:

  1. Database for storing meter results.
  2. API enabling secure interaction with the database.

I chose CosmoDB as my database because I had never worked with it before and wanted to better understand how it worked.

For this project, I plan to deploy as many resources as possible using Bicep. My favorite way to do this is to first launch something through the Azure Portal, and then parse the resulting ARM template. The ARM template for CosmoDB is also refreshingly simple.

Using the ARM template as a reference and combining it with VS Code’s IntelliSense makes writing Bicep feel intuitive and satisfying. I really enjoy it.
My first attempt resulted in the following error. I caused it by placing the location property in the wrong place…
mistake
This is what my biceps looked like at the end:

param dbname string
param location string = resourceGroup().location
param primaryRegion string
resource cosmodb 'Microsoft.DocumentDB/databaseAccounts@2024-02-15-preview' = {
  name: dbname
  location: location
  properties: {
    databaseAccountOfferType:'Standard' 
        locations: (
    {
      failoverPriority: 0
      locationName: primaryRegion
      isZoneRedundant: false
    }
      )
    backupPolicy:{
      type: 'Continuous'
      continuousModeProperties:{
        tier:'Continuous7Days'
      }
    }
    isVirtualNetworkFilterEnabled:false
    minimalTlsVersion:'Tls12'
    enableMultipleWriteLocations:false
    enableFreeTier: true
        capacity:{
      totalThroughputLimit: 1000
    }
  }
}

Turn on full screen mode

Exit full screen mode

The most important part of this code is total bandwidth limit because it allows me to stay on the free version.

In my new database, I created a new container called “VisitorCounter” and created a list for my visitors. This is where I learned what a Partition Key is. A Partition Key is basically the UID of the database entries. Since I wanted the visitor count to be unique for each IP address, I set this as my Partition Key.

Since the plan was to log public IP addresses, I did some research on the legal and ethical issues involved and came to the following conclusions:

  1. One guy on StackOverflow said it’s fine as long as I don’t do anything malicious with them.
  2. I will ignore Copilot’s warnings regarding privacy and GDPR.
  3. I can collect them using Azure functions.

Writing a Function App Bicep was not refreshingly simple. It was much more difficult than CosmoDB Bicep. When deploying resources via Portal, you lose all the appreciation for all the supporting child resources that are created along with the parent resource. All of these supporting resources need to be configured individually and then mapped to the parent resource. Each of them has its own unique properties and requirements, so a simple Portal deployment can turn into a complex Bicep deployment.

As I mentioned earlier, I usually create an asset in the portal and then go through the ARM template, but in this case the resulting template was a bit like a maze, so I decided to build it from scratch instead of doing top-down engineering.
Initial sketchThis was my first sketch and it actually worked.. Except it was implementing an App Service Plan.. Not a Feature App. Fast forward a few hours of trial and error with Microsoft Learn, I didn’t get any further. I finally gave up and googled it with the intention of avoiding Microsoft Learn.
This led me to this blog and about an hour later we had success!

This left me with some mild frustration at the fact that I was simply confused as to how this person had figured this out. I tried to reconcile what they were discussing with Microsoft’s documentation and was left with my head in the clouds. While Copilot did a lot of the hard work to help me understand their code, this would be the point where I would turn to my colleagues for validation or Microsoft support for reassurance.

Something I learned is the “Action” operations in Bicep. In the example below I used listKeys to get the keys of a storage account I just deployed.

param location string = resourceGroup().location
param name string = 'beeresumequery'

resource storageaccount 'Microsoft.Storage/storageAccounts@2023-04-01' = {
  name: '${name}storage'
  location: location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'StorageV2'
}
var StorageAccountPrimaryAccessKey = listKeys(storageaccount.id, storageaccount.apiVersion).keys(0).value

resource appinsights 'Microsoft.Insights/components@2020-02-02' ={
  name: '${name}appinsights'
  location: location
  kind: 'web'
  properties:{
    Application_Type: 'web'
    publicNetworkAccessForIngestion:'Enabled'
    publicNetworkAccessForQuery:'Enabled'
  }
}
var AppInsightsPrimaryAccessKey = appinsights.properties.InstrumentationKey

resource hostingplan 'Microsoft.Web/serverfarms@2023-12-01' = {
  name: '${name}hp'
  location: location
  kind: 'linux'
  properties: {
    reserved:true
  }
  sku:{
    name: 'Y1' //Consumption plan
  }
}

resource ResumeFunctionApp 'Microsoft.Web/sites@2023-12-01' = {
  name: '${name}functionapp'
  location: location
  kind: 'functionapp'
  identity:{
    type:'SystemAssigned'
  }
  properties:{
    httpsOnly:true
    serverFarmId:hostingplan.id
    siteConfig:{
//      use32BitWorkerProcess:true //this allows me to use the FREEEEE tier
      alwaysOn:false
      linuxFxVersion: 'python|3.11'
      cors:{
        allowedOrigins: (
          'https://portal.azure.com'
        )
      }
      appSettings:(
        {
          name: 'APPINSIGHTS_INSTRUMENTATIONKEY'
          value: AppInsightsPrimaryAccessKey
        }
        {
          name: 'APPLICATIONINSIGHTS_CONNECTION_STRING'
          value: 'InstrumentationKey=${AppInsightsPrimaryAccessKey}'
        }
        {
          name: 'AzureWebJobsStorage'
          value: 'DefaultEndpointsProtocol=https;AccountName=${storageaccount.name};EndpointSuffix=${environment().suffixes.storage};AccountKey=${StorageAccountPrimaryAccessKey}'
        }
        {
          name: 'FUNCTIONS_EXTENSION_VERSION'
          value: '~4'
        }
        {
          name: 'FUNCTIONS_WORKER_RUNTIME'
          value: 'python'
        }
        {
          name: 'WEBSITE_CONTENTSHARE'
          value: toLower(storageaccount.name)
        }
        {
          name: 'WEBSITE_CONTENTAZUREFILECONNECTIONSTRING'
          value: 'DefaultEndpointsProtocol=https;AccountName=${storageaccount.name};EndpointSuffix=${environment().suffixes.storage};AccountKey=${StorageAccountPrimaryAccessKey}'
        }
      )
    }
  }
}
Turn on full screen mode

Exit full screen mode

Once I had the feature app deployed, I still needed to figure out how to create an interaction between the app and my website.

I created an http trigger using the built-in templates and given my limited experience with Python, I figured it would be best to spend some time understanding what was going on before I went any further. Spending some time now will pay off later when I want to fix things or troubleshoot issues.

import azure.functions as azfunc
#This imported the Azure Functions SDK. I've always visualised SDKs as a sort of Ikea flatpack box, except for programmers. I didn't think using an SDK would be this simple though. 
#The original template imports this as 'func' but I've changed it to 'azfunc' just to make it more clear that its the SDK and not a python shorthand. 
import logging
#straight forward
app = azfunc.FunctionApp(http_auth_level=azfunc.AuthLevel.FUNCTION)
#This creates an instance of the 'FunctionApp' class within the code. FunctionApp is basically a blueprint from the SDK for creating a "function app object". 
#The section in brackets () defines what level of authentication is needed. ANONYMOUS is no auth, FUNCTION requires the function key and ADMIN requires the master key. 
#What is a class? A class is the blueprint, it defines how an object is created. Providing structure and methods for performing a specific task.  
#What is an object? It is something that is built based on a blueprint. The objects below are HttpRequest and HttpResponse.
#By creating this instance, I don't need to define what those two objects actually are. Which is good because I wouldn't know how. 
@app.route(route="http_trigger1")
#This uses app.route as a decorator to define a route for the function app. So if a HTTP request is made to my function app followed by the trigger /http_trigger1, the below function will activate.
#What is a route? A route is a pathway that can be taken within an application. The route is functionappurl.com/http_trigger1
#What is a decorator? Decorators are sort've layered functions. Do this but also do that with it. E.g You can have 'Hello World!' and create a decorator for it that converts all letters to uppercase to produce 'HELLO WORLD!'.
def http_trigger1(req: azfunc.HttpRequest) -> azfunc.HttpResponse: 
#this defines the http_trigger1 function. It notes that it requires the HttpRequest object to function. 
#"-> azfunc.HttpResponse:" is something that is referred to as 'type hinting'. It advises that the expected response here is a HttpResponse
#What is Type Hinting? Type Hinting is something you add to your code to improve readability and to know what the intention of the code is. 
#The difference between commenting and Type Hinting is that Type Hinting can be used by some tools for error checking and debugging. They're kind of like comments but for your tools. 
#Imagine an interesting future where the Natural Language from comments could be used for Type Hinting.
#I expressed the above idea to Bing and then it showed me an example of a Natural Language comment being interpreted as a type hint. 
#Bing is just showing off now. 
    logging.info('Python HTTP trigger function processed a request.')
#Straight forward, performs a logging action. I assume the .info refers to the fact that this is just information, not an error message or anything. 
    name = req.params.get('name')
    if not name:
        try:
            req_body = req.get_json()
        except ValueError:
            pass
        else:
            name = req_body.get('name')
#This script is trying to get the ‘name’ value from the request parameters. If ‘name’ is not provided in the parameters, it then tries to get ‘name’ from the JSON body of the request.
#If ‘name’ is not in the JSON body or if the body is not valid JSON, name will be None.
#If ‘name’ is found in either the parameters or the JSON body, it will be assigned to the name variable. If ‘name’ is not found in either place, name will be None.
#So basically when a HTTP request is made to the function app url, it needs to include a parameter that defines a name. E.g "Name=Brandon". If there is no name then it'll check if there is one in the JSON body. If not found then nothing happens. 
    if name:
        return azfunc.HttpResponse(f"Hello, {name}. This HTTP triggered function executed successfully.")
    else:
        return azfunc.HttpResponse(
             "This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response.",
             status_code=200
        )
#The above is straight forward. The previous block was looking for a name because it wants to pass that name into this block. So it takes that parameter and places it into the {name} field.
#If there is no name then it tells you to include a name in the query string. 


#Running this code 
#HTTP Method : Get / POST (If using JSON body)
#Key = The URL of my functionapp
#Query parameters 'name:brandon' or no name
#Headers. None / Content-Type:application/json (If using JSON body))
Turn on full screen mode

Exit full screen mode

Copilot then created some simple JavaScript code to test it.

<!DOCTYPE html>
<html>
<head>
    <title>Fetch Example</title>
</head>
<body>
    <button id="fetchButton">Fetch Data</button>
    <div id="data">Press the button to fetch data...</div>
    <script>
        document.getElementById('fetchButton').addEventListener('click', fetchData);
        async function fetchData() {
            document.getElementById('data').innerText = "Loading...";
            try {
                const response = await fetch('https://functionapp.azurewebsites.net/api/http_trigger1?code=1234');
                if (!response.ok) {
                    throw new Error(`HTTP error! status: ${response.status}`);
                }
                const data = await response.text();
                document.getElementById('data').innerText = data;
            } catch (error) {
                console.error('Error:', error);
                document.getElementById('data').innerText = error.message;
            }
        }
    </script>
</body>
</html>
Turn on full screen mode

Exit full screen mode

This test code simply pings my function app and returns the message I received. In this case, I got an error that turned out to be related to CORS. CORS, or Cross Original Resource Sharing, controls which domains can query your function app. It also allows you to use a wildcard to allow all.
A successful query looked like this
Image success
Unfortunately, the keys to my function app were stored in the html itself, and keeping them there was unthinkable. Although CORS ensures that only my domain can make requests to my API, my intuition tells me that it would be trivial to work around this.
After reading a few texts, I need to do a few things to do it right:

  1. CORSES – ✔️
  2. HTTPS-✔️
  3. Logging and monitoring – ✔️
  4. Network Isolation – requires investigation.
  5. Azure API Management – ​​​​requires investigation.
  6. Store your keys in KeyVault – ❌

Azure API Management seemed interesting to me, but it didn’t support network isolation, so I started there.

The Cloud Resume Challenge briefing notes that many people struggle to get beyond the initial stages of deploying a site. I began to understand why when writing Bicep for the APIM resource. There are so many new concepts to learn, and deploying resources with Bicep adds another layer of difficulty.

It took me a lot longer to write Biceps than I expected, but I learned a lot along the way. Cool Biceps tricks like parameter objects, parameter arrays, create if not found, modules, and outputs, to name a few.

After starting APIM and spending some time clicking around with resources, I thought it would be worth considering what my secure workflow would look like:

  1. My website queries my Key Vault for keys for my API
  2. This works because my website is a managed identity and has read access to these keys
  3. Then my website sends a request to my API service
  4. My API service sends requests to my function app
  5. My function app queries my Cosmo database
  6. Then it flows back to the result on my website

This makes no sense. My goal is simply to have my static Azure Storage web app interact securely with my database. Directly referencing KeyVault from the front end is apparently bad practice, as is including function and API keys in code. Copilot suggested that I start a FunctionApp to talk to my vault, so that I can talk to my APIM resource, which talks to my function app.

The next few hours spent figuring out the best way to approach this problem can be summarized as follows:

  1. “You can safely access secrets by doing this…”
  2. “It’s not actually safe because people can still do it…”
  3. “Just use Azure Static Web App”

I know that securing this in some form was possible because others who completed this challenge did. But I made it a rule that I would not copy others, and instead would find my own solutions. Eventually, I came to the conclusion that spending hours trying to make something work when it is not the best way to do it is madness.

I decided to change direction and start the process of migrating my site from Storage accounts to a true static web app. Setting up the resource was a breeze. I did it through the Portal because I just wanted to get going, but I made a note to write Biceps when I wasn’t so annoyed anymore.

I also noticed that Azure Static Webapps has a preview feature that allows you to connect directly to an Azure database with built-in security, the same role-based security used to secure API endpoints. This means I won’t need a function app or API manager, which greatly reduces the complexity of my project. I’ll find another reason to use Python for this project, as I want to learn more about the language.

When building my Static Web App, I took the opportunity to set up a pipeline between my GitHub and my Static Web App. This means that committing my repository will trigger a GitHub action workflow that will deploy the code to my website. Very satisfying!

I will describe how I did it in my next blog post.

Reflecting on the above, it was an absolute slog. It’s hard to express here, but I spent about 80% of my time reading documentation and troubleshooting.