Compare Forex Rates using Python
Parsing PDFs to make better decisions
Photo by Jason Leung on Unsplash
Currency Exchange Rates
When we transfer foreign currency, we look for the best exchange rates. Now, this rate changes daily and varies across banks. It is hard to predict the currency exchange rate but you can select the bank based on that day's rate.
Comparing daily rates
While, some services provide clear information about the rate and associated conversion fees. Indian banks like SBI and HDFC publish their daily forex rates in a PDF which makes it cumbersome to compare. As of today, Google will only show you a range of the rates.
This is an experiment to extract data from these PDFs and compare rates. To keep things simple, we will use only 2 banks. Also, we will assume the currency is USD. That means we will only look for TT Buy rates.
Environment Setup
This section assumes you have the recent Python 3.11 version installed and you are working in a virtualenv.
From your preferred package manager, install the following. I am using apt (Ubuntu) and pip (Python) here:
# System wide install
apt install ghostscript python3-tk
# Python packages
pip install camelot-py
pip install requests
pip install ghostscript
Check Package Installation
Let us import all these package to ensure they work.
# Check Ghostscript is installed
from ctypes.util import find_library
find_library("gs")
import camelot
import requests
Download Data
You can download the PDFs (search for bank forex rates) from the browser or copy the PDF url from the search results and then, use the code below. This may be cumbersome as some banks have a fixed URL while others change it daily but this this is the only manual step.
urls = {
"sbi": "<insert PDF link here>",
"hdfc":"<insert PDF link here>",
# add other banks here
}
# Download files as bank.pdf
for bank, file_url in urls.items():
file_data = requests.get(file_url).content
with open(f"{bank}.pdf", "wb") as file:
file.write(file_data)
Now, we can extract the data we require. Camelot supports many formats. We will use pandas DataFrame throughout this experiment.
# Extracting a table from PDF is that easy
# Luckily, these PDFs are tabular
tables = camelot.read_pdf(f'sbi.pdf')
# There are many options to export this PDF
# We will use the Pandas DataFrame
df = tables[0].df
Pandas but you can export to any other format. Pandas documentation is a great start to learn about the various selection methods.
# Now, we need to find the location of USD TT Buy
# Here's what I got from a bit of experimenting
# Skipping the verbose output of the entire PDF
print(df.iloc[[1, 2], [0, 2]])
0 2
1 CURRENCY TT BUY
2 UNITED STATES DOLLAR 82.57
Let us store these locations in a dict
and extract the values. Ideally, we should be able to use row labels instead of integers.
# Bank -> Location
banks = {
"sbi": [[1, 2], [0, 2]],
"hdfc": [[1, 21], [0, 6]]
}
def get_rate(bank: str):
# Assumes files are placed in a same dir
tables = camelot.read_pdf(f'./{bank}.pdf')
df = tables[0].df
location = banks[bank]
# Doesn't work with Python 3.10 or lower
# Check Colab Notes
usd_tt_buy = df.iloc[*location]
return usd_tt_buy
# Display the rates
for bank in banks:
rate = get_rate(bank)
# Get a value from df
usd = float(rate.iloc[1, 1])
print(f"{bank.upper()} USD: {usd}")
That's it. Now this will display the rates for each bank.
Store data in a database
These rates are refreshed daily so, we should store them somewhere to analyse later. A good start would be a simple sqlite
db. However, as we have DataFrame, we will use duckdb
which provides options to ingress and export data in several formats. Let us see what it can do:
# Ensure duckdb is installed
import duckdb
# Create a table in the database
with duckdb.connect("rates.db") as conn:
conn.execute('''CREATE TABLE forex_rates_table (
date DATE,
bank VARCHAR,
currency varchar,
tt_buy DOUBLE)''')
As we are limited to 2 banks and 1 currency, we could hard-code them. This makes the examples much readable. Now, the only unknown is the rate.
To get values from the different DataFrames we have to figure out their location. After inspecting rates[0].columns
where rates
is a list of DataFrame
, I found that the value is at [2][2]
and [6][21]
respectively. The integer indices are not ideal and could break the code in future. For example, SBI publishes different rates based on the amount to be transferred. This code is only using the rate from Page 1.
Let us insert these rows in the db to complete our experiment. We keep this data in a dict
:
# Insert data
values = {
"sbi": rates[0][2][2],
"hdfc": rates[1][6][21]
}
with duckdb.connect("rates.db") as conn:
currency = 'usd'
for bank, rate in values.items():
conn.execute('''INSERT INTO forex_rates_table VALUES (current_date, ?, ?, ?) ''',
[bank, currency, rate ])
Querying is simple too.
# Query Database Rows as DataFrame
with duckdb.connect("rates.db") as conn:
result = conn.execute('SELECT * FROM forex_rates_table').fetch_df()
Note that the result is converted to DataFrame. This is helpful in a Jupyter/Colab Notebook as they support viewing data.
Notes on Google Colab
Google Colab is a great way to run code and analyse data. With all the extra batteries included, it comes with a few quirks too:
Some version
camelot-py
package does not work with the latestpypdf
, so we need to explicitly installPyPDF2<3.0
using!pip install 'PyPDF2<3.0'
Python 3.10 does not allow
usd = df.iloc[*loc]
So, we need to be explicit. I tried to find a cleaner way but this works for now.You have to install ghostscript using apt.
Also, you can use the following formatter to display DataFrame
as a table:
from google.colab import data_table
data_table.enable_dataframe_formatter()
Conclusion
That completes our experiment. We have created a way to collect forex rates and store them in a database. Here we learned:
PDF parsing with camelot-py
Using pandas DataFrame and data selection
Creating a simple DuckDB database
Experimenting with Google Colab.
Next, we can deploy this code to run daily and expose it through an API. You can try a working example in this Colab Notebook or see output at GitHub