Compare Forex Rates using Python

Parsing PDFs to make better decisions

Compare Forex Rates using Python

Photo by Jason Leung on Unsplash

Currency Exchange Rates

When we transfer foreign currency, we look for the best exchange rates. Now, this rate changes daily and varies across banks. It is hard to predict the currency exchange rate but you can select the bank based on that day's rate.

Comparing daily rates

While, some services provide clear information about the rate and associated conversion fees. Indian banks like SBI and HDFC publish their daily forex rates in a PDF which makes it cumbersome to compare. As of today, Google will only show you a range of the rates.

This is an experiment to extract data from these PDFs and compare rates. To keep things simple, we will use only 2 banks. Also, we will assume the currency is USD. That means we will only look for TT Buy rates.

Environment Setup

This section assumes you have the recent Python 3.11 version installed and you are working in a virtualenv.

From your preferred package manager, install the following. I am using apt (Ubuntu) and pip (Python) here:

# System wide install
apt install ghostscript python3-tk

# Python packages
pip install camelot-py
pip install requests
pip install ghostscript

Check Package Installation

Let us import all these package to ensure they work.

# Check Ghostscript is installed
from ctypes.util import find_library
find_library("gs")

import camelot
import requests

Download Data

You can download the PDFs (search for bank forex rates) from the browser or copy the PDF url from the search results and then, use the code below. This may be cumbersome as some banks have a fixed URL while others change it daily but this this is the only manual step.

urls = {
    "sbi": "<insert PDF link here>",
    "hdfc":"<insert PDF link here>",
    # add other banks here
    }

# Download files as bank.pdf
for bank, file_url in urls.items():
    file_data = requests.get(file_url).content
    with open(f"{bank}.pdf", "wb") as file:
        file.write(file_data)

Now, we can extract the data we require. Camelot supports many formats. We will use pandas DataFrame throughout this experiment.

# Extracting a table from PDF is that easy
# Luckily, these PDFs are tabular
tables = camelot.read_pdf(f'sbi.pdf')

# There are many options to export this PDF
# We will use the Pandas DataFrame  
df = tables[0].df

Pandas but you can export to any other format. Pandas documentation is a great start to learn about the various selection methods.

# Now, we need to find the location of USD TT Buy
# Here's what I got from a bit of experimenting
# Skipping the verbose output of the entire PDF
print(df.iloc[[1, 2], [0, 2]]) 
                      0       2
1              CURRENCY  TT BUY
2  UNITED STATES DOLLAR   82.57

Let us store these locations in a dict and extract the values. Ideally, we should be able to use row labels instead of integers.

# Bank -> Location
banks = {
    "sbi": [[1, 2], [0, 2]],
    "hdfc": [[1, 21], [0, 6]]
}

def get_rate(bank: str):
    # Assumes files are placed in a same dir 
    tables = camelot.read_pdf(f'./{bank}.pdf')
    df = tables[0].df
    location = banks[bank]
    # Doesn't work with Python 3.10 or lower
    # Check Colab Notes
    usd_tt_buy = df.iloc[*location]
    return usd_tt_buy

# Display the rates
for bank in banks:
    rate = get_rate(bank)
    # Get a value from df
    usd = float(rate.iloc[1, 1])
    print(f"{bank.upper()} USD: {usd}")

That's it. Now this will display the rates for each bank.

Store data in a database

These rates are refreshed daily so, we should store them somewhere to analyse later. A good start would be a simple sqlite db. However, as we have DataFrame, we will use duckdb which provides options to ingress and export data in several formats. Let us see what it can do:

# Ensure duckdb is installed
import duckdb 

# Create a table in the database
with duckdb.connect("rates.db") as conn:
    conn.execute('''CREATE TABLE forex_rates_table (
        date DATE,
        bank VARCHAR,
        currency varchar,
        tt_buy DOUBLE)''')

As we are limited to 2 banks and 1 currency, we could hard-code them. This makes the examples much readable. Now, the only unknown is the rate.

To get values from the different DataFrames we have to figure out their location. After inspecting rates[0].columns where rates is a list of DataFrame , I found that the value is at [2][2] and [6][21] respectively. The integer indices are not ideal and could break the code in future. For example, SBI publishes different rates based on the amount to be transferred. This code is only using the rate from Page 1.

Let us insert these rows in the db to complete our experiment. We keep this data in a dict :

# Insert data

values = {
   "sbi": rates[0][2][2],
   "hdfc": rates[1][6][21]
}

with duckdb.connect("rates.db") as conn:
    currency = 'usd'
    for bank, rate in values.items():
        conn.execute('''INSERT INTO forex_rates_table VALUES (current_date, ?, ?, ?) ''',
                    [bank, currency, rate ])

Querying is simple too.

# Query Database Rows as DataFrame
with duckdb.connect("rates.db") as conn:
    result = conn.execute('SELECT * FROM forex_rates_table').fetch_df()

Note that the result is converted to DataFrame. This is helpful in a Jupyter/Colab Notebook as they support viewing data.

Notes on Google Colab

Google Colab is a great way to run code and analyse data. With all the extra batteries included, it comes with a few quirks too:

  1. Some version camelot-py package does not work with the latest pypdf, so we need to explicitly install PyPDF2<3.0 using !pip install 'PyPDF2<3.0'

  2. Python 3.10 does not allow usd = df.iloc[*loc]So, we need to be explicit. I tried to find a cleaner way but this works for now.

  3. You have to install ghostscript using apt.

Also, you can use the following formatter to display DataFrame as a table:

from google.colab import data_table
data_table.enable_dataframe_formatter()

Conclusion

That completes our experiment. We have created a way to collect forex rates and store them in a database. Here we learned:

  • PDF parsing with camelot-py

  • Using pandas DataFrame and data selection

  • Creating a simple DuckDB database

  • Experimenting with Google Colab.

Next, we can deploy this code to run daily and expose it through an API. You can try a working example in this Colab Notebook or see output at GitHub