Serving Data using FastAPI with Google BigQuery
Implementing an API using Python's FastAPI library to Serve Data from Google BigQuery
One of the key stages in a data pipeline is the end stage which is can be termed as “Serving Data”. This could be in multiple ways, such as for data analysis or machine learning models or even just providing the data source through an API. This is indeed what we would focus on this tutorial.
We will look into how we can build an API using FastAPI, a modern, fast (high-performance), web framework for building APIs with Python. The dataset that we will be using would be the one that we scraped and stored in BigQuery in the previous articles. We'll create endpoints to access stock data stored in a Google BigQuery dataset.
You can download the code for this article here
Prerequisites and Data Structure
Before getting started, ensure you have the following:
Python installed on your system (preferably version 3.7 or higher)
Google Cloud Platform (GCP) project with BigQuery enabled and access to the dataset
Basic understanding of Python and REST API concepts
We can review the dataset structure by looking at the picture from our previous article:
Install the required libraries/credentials
Make sure you have the libraries needed from the previous articles but the main ones that we will be using in this article can be installed by the following command:
pip install fastapi uvicorn google-cloud-bigquery
Also, make sure you follow this article for setting up the credentials for Google Cloud and also the required dataset
Defining our API
We can now start defining our end points using FastAPI, and querying the data using SQL-like syntax.
Importing the required libraries
We start by importing the required libraries and initializing our FastAPI app and bigquery client:
from fastapi import FastAPI, HTTPException
from google.cloud import bigquery
from dotenv import load_dotenv
import os
load_dotenv()
app = FastAPI()
# Initialize BigQuery client
client = bigquery.Client()
PROJECT_ID = os.environ.get("PROJECT_ID")
DATASET_ID = os.environ.get("DATASET_ID")
TABLE_ID = os.environ.get("TABLE_ID")
dataset_str = f"{PROJECT_ID}.{DATASET_ID}.{TABLE_ID}"
Make sure that your .env
file contains these IDs in addition to the GOOGLE_APPLICATION_CREDENTIALS
variable with the path to the json file. Note that we are here defining all the IDs because it is a personal database (only on your account). We can make it public but right now we want to learn about the basics of accessing the data in BigQuery and building an API. Let’s now start by defining the endpoints one by one:
Getting Stock data by name
We can start by getting the data of any given stock by their stock name. This would require selecting the data where stock_name
is equal to the given stock and then converting it to a json appropriate format:
# Define endpoint to get stock data by stock name
@app.get("/stock/{stock_name}")
def get_stock_data(stock_name: str):
query = f"""
SELECT *
FROM {dataset_str}
WHERE stock_name = '{stock_name}'
"""
try:
# Execute the query
query_job = client.query(query)
results = query_job.result()
# Convert results to list of dictionaries
data = [dict(row) for row in results]
return data
except Exception as e:
raise HTTPException(status_code=500, detail=f"An error occurred: {str(e)}")
Before we can query the endpoint, we need to start the FastAPI server using the following command (make sure your file name is [main.py](<http://main.py>)
or change it in the command if it is different:
uvicorn main:app --reload
Now, the FastAPI server should be running locally, and you can access the endpoints at http://localhost:8000. Go to your browser and the url http://localhost:8000/stock/DAX
:
Note that if you are seeing an error, it could be due to a trailing space in the stock_name
column in your dataset. You could either remove it from the database or use the url with a trailing space too.
We can also test the end point by running the following:
curl http://localhost:8000/stock/DAX
The output is like following:
Congratulations! You have successfully built your first endpoint to serve the data. We can build more complex now too:
Get stock data within a date range
To achieve this, we need to run an SQL query that filters the data, where date is between start and end date given by the URL parameters.
# Endpoint to get stock data within a date range
@app.get("/stock/date_range/")
async def get_stock_data_by_date_range(start_date: str, end_date: str):
query = f"""
SELECT *
FROM {dataset_str}
WHERE Date BETWEEN '{start_date}' AND '{end_date}'
"""
try:
# Execute the query
query_job = client.query(query)
results = query_job.result()
# Convert results to list of dictionaries
data = [dict(row) for row in results]
return data
except Exception as e:
raise HTTPException(status_code=500, detail=f"An error occurred: {str(e)}")
Let’s query this now:
curl http://localhost:8000/stock/date_range/?start_date={2024-02-01}&end_date={2024-03-01}
whose output is like this:
Get stocks within a price range
We can also search for stocks whose prices lie within the range we search for:
# Endpoint to get stock data by price range
@app.get("/stock/price_range/")
async def get_stock_data_by_price_range(min_price: float = Query(None), max_price: float = Query(None)):
condition = ""
if min_price is not None and max_price is not None:
condition = f"Price BETWEEN {min_price} AND {max_price}"
elif min_price is not None:
condition = f"Price >= {min_price}"
elif max_price is not None:
condition = f"Price <= {max_price}"
query = f"""
SELECT *
FROM {dataset_str}
WHERE {condition}
"""
try:
# Execute the query
query_job = client.query(query)
results = query_job.result()
# Convert results to list of dictionaries
data = [dict(row) for row in results]
return data
except Exception as e:
raise HTTPException(status_code=500, detail=f"An error occurred: {str(e)}")
Note that here we give the user the option to either specify one of the min or max price argument or both of them.
curl http://localhost:8000/stock/price_range/?min_price={5000}&max_price={10000}
which returns something like following:
Or we can just query it with max_price
:
curl <http://localhost:8000/stock/price_range/?max_price={100}>
And we can see only “S&P 500 VIX” has a price that I can afford haha
API Docs
FastAPI automatically generates API Docs for you using your code (using Swagger AI) and you can access it by going to http://localhost:8000/docs
:
Conclusion
The article provides a tutorial on how to build an API using FastAPI to serve data from Google BigQuery. It covers the installation of necessary libraries, setting up credentials, and defining API endpoints for accessing stock data by name, within a date range, and within a price range. The tutorial assumes a basic understanding of Python and REST API concepts and requires Python 3.7 or higher and a Google Cloud Platform project with BigQuery enabled.