Data Engineering with Python: A Practical Guide to Fetching and Processing Data via REST API
Leveraging the Power of Python and REST APIs to Retrieve and Analyze Stock Market Data
One of the most important parts of a data engineering project is getting the data. This could be done in a number of ways either by accessing databases provided by different companies, getting the data from a REST API or even scraping webpages. In this article, we will use Python to access data from a REST API, particularly stocks data.
The stock market, with tech giants like FAANG stocks provides a rich source of data This project will guide you on how to engineer a data pipeline to fetch the data from the REST API provided by Alpha Vantage.
You can find the code for this article from here
Setting Up Your Environment
Before we start, you'll need to install some tools. Firstly, ensure you have Python installed on your machine. If not, you can download it from the official Python website. Secondly, install the requests
library using pip, the python package installer. Open your terminal and type:
pip install requests
This library allows us to make requests to the internet, and specifically to the Alpha Vantage API.
Getting Your API Key
Alpha Vantage provides free APIs for historical and real-time data on stocks, forex, and cryptocurrencies. To use the Alpha Vantage API, you will need an API Key. Visit Alpha Vantage to get your free API Key. You can also use temporary email services to register for a key.
Choosing Stocks to Fetch
In our tutorial, we're focusing on the tech giants that are often grouped together as "FAANG" stocks - Facebook, Amazon, Apple, Netflix, and Google - plus a few others. Below is the list of stock symbols for these companies:
stock_symbols = ["AAPL", "AMZN", "NFLX", "ABNB", "NVDA", "AMD", "MSFT", "GOOGL", "META"]
stocks = ["Apple", "Amazon", "Netflix", "AirBnb", "Nvidia", "AMD", "Microsoft", "Google", "Meta"]
Fetching Stock Data
We will use the TIME_SERIES_MONTHLY_ADJUSTED
function from the Alpha Vantage API. This function gives us access to detailed monthly trading data for the specified equity, including monthly open, high, low, close, adjusted close, volume, and dividend. This would return 20 years of monthly data. The documentation is provided on the Alpha Vantage website
To fetch the data for a specific stock, we make a GET request to the Alpha Vantage API with the function and symbol parameters, and our API key. Here's an example:
import requests
def fetch_data(symbol, api_key):
url = f"https://www.alphavantage.co/query?function=TIME_SERIES_MONTHLY_ADJUSTED&symbol={symbol}&apikey={api_key}"
response = requests.get(url)
data = response.json()
return data
In the above code, we define a function fetch_data
that takes a stock symbol and API key as input. It sends a GET request to the Alpha Vantage API and stores the response. The response is then converted from JSON format to a Python dictionary.
You can inspect the data by printing it and see that it contains the monthly data as individual dictionaries with date as the key and the data (open, high, low, etc) as the value.
Parsing Data
We want to convert the data into a more easy-to-understand format and which allows easy conversion to tabular (CSV or SQL) data. For each stock and each trading day, we'll build a dictionary containing the stock name, stock symbol, date, open, high, low, close, adjusted_close, volume, and dividend amount.
import csv
def parse_data(stock_data, symbol, stock_name):
monthly_data = stock_data['Monthly Adjusted Time Series']
total_data = []
for date, info in monthly_data.items():
data = {
"stock": stock_name,
"stock_symbol": symbol,
"date": date,
"open": info['1. open'],
"high": info['2. high'],
"low": info['3. low'],
"close": info['4. close'],
"adjusted_close": info['5. adjusted close'],
"volume": info['6. volume'],
"dividend_amount": info['7. dividend amount'],
}
total_data.append(data)
return total_data
The parse_data
function takes the stock data, symbol, and name as input, creates a dictionary, and writes the stock data into the list of dictionaries.
Putting it Together
We can now loop through the stocks to fetch and parse the data:
all_stocks_data = []
for stock, stock_sym in zip(stocks, stock_symbols):
stock_data = fetch_data(stock_sym, API_KEY)
parsed_stock_data = parse_data(stock_data, stock_sym, stock)
all_stocks_data.extend(parsed_stock_data)
Saving the Data
After parsing the data, we need to save it for further analysis. We can write our data to a CSV file using the csv
module in Python:
def write_to_csv(data, filename):
keys = data[0].keys()
with open(filename, 'w', newline='') as output_file:
dict_writer = csv.DictWriter(output_file, keys)
dict_writer.writeheader()
dict_writer.writerows(data)
This function takes in our parsed data and a filename, then writes the data to a CSV file. We can call this function for each stock's data:
write_to_csv(all_stocks_data, "stocks_data.csv")
With this, we now have a CSV file for each stock containing its monthly adjusted time series data. Let’s have a look at the data:
Analyzing the Data
Once we have our data, we can do a simple analysis of adjusted close value over time:
import pandas as pd
import matplotlib.pyplot as plt
def analyze_all_data_average():
data = pd.read_csv("stocks.csv")
data['date'] = pd.to_datetime(data['date'])
data['year'] = data['date'].dt.year
yearly_data = data.groupby(['stock', 'year'])['adjusted_close'].mean().reset_index()
grouped_data = yearly_data.groupby('stock')
for name, group in grouped_data:
plt.plot(group['year'], group['adjusted_close'], label=name)
plt.title('Average Yearly Adjusted Close Value Over Time')
plt.xlabel('Year')
plt.ylabel('Average Adjusted Close Value')
plt.legend(loc='upper left')
plt.show()
analyze_all_data_average()
The above function first reads the 'stocks.csv' file into a pandas DataFrame. As we have 20 years of monthly data, it is not very intuitive to show around 240 data points on the x-axis. That’s why we extract the year from the data and group the data by both the stock and the year, and calculate the mean adjusted_close
value for that year. Finally, we plot this data for each stock.
While we could perform many other analyses, the aim of this article is to demonstrate how to fetch data using a REST API. For instance, we could use this data to:
Compare the performance of these companies over time.
Analyze the correlation between these stocks and how they move together.
Predict future stock prices using machine learning algorithms.
Study the impact of specific events on the stock prices.
The goal of a Data Engineer is to make this data readily available for these analysis and even for advanced tasks like machine learning. I hope this article does help in that goal
Conclusion
The article provides a guide on how to fetch FAANG stocks data from a REST API using Python and the Alpha Vantage API. The process involves setting up the environment, getting an API key, choosing stocks to fetch, fetching and parsing the stock data, and saving the data for further analysis. The data fetched can be used for various analyses, including comparing the performance of these companies over time, analyzing the correlation between these stocks, predicting future stock prices using machine learning algorithms, and studying the impact of specific events on the stock prices.