<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Code & Manas]]></title><description><![CDATA[From Python to Petabytes: Exploring Technology. Manas writes about Python, Distributed Systems, Storage, File Systems, Open Source and more.]]></description><link>https://code.manas.me</link><image><url>https://cloudmate-test.s3.us-east-1.amazonaws.com/uploads/logos/639762ac55fe792833c76ae2/0f23dafc-da9f-4d86-8399-04e699fd5c5b.png</url><title>Code &amp; Manas</title><link>https://code.manas.me</link></image><generator>RSS for Node</generator><lastBuildDate>Tue, 14 Apr 2026 02:04:27 GMT</lastBuildDate><atom:link href="https://code.manas.me/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Coding with the Agents, Securely]]></title><description><![CDATA[AI Coding - The Ever changing stack
As of early 2026, coding with AI involves interacting with LLM models like a REPL. The difference is that the interaction uses natural language instead of direct co]]></description><link>https://code.manas.me/coding-with-the-agents-securely</link><guid isPermaLink="true">https://code.manas.me/coding-with-the-agents-securely</guid><category><![CDATA[coding]]></category><category><![CDATA[agentic AI]]></category><category><![CDATA[agents]]></category><category><![CDATA[#ai-tools]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Mon, 19 Jan 2026 14:24:30 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/dG0mtYvNL-A/upload/fc5963e21b875322ad50b20e0aa125cf.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>AI Coding - The Ever changing stack</h1>
<p>As of early 2026, coding with AI involves interacting with LLM models like a <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">REPL</a>. The difference is that the interaction uses natural language instead of direct code input. New models are launched frequently and it is hard to keep up with the latest releases.</p>
<h2>Interface</h2>
<blockquote>
<p>As far as the customer is concerned, the interface is the product". - Jeff Raskin</p>
</blockquote>
<p>For coding, the chat interface is often not sufficient as it can be slow for iterative development. The interaction happens through tools. Currently, the popular options are:</p>
<ol>
<li><p>New (relatively) IDEs like Cursor, Antigravity with built in AI features</p>
</li>
<li><p>Popular IDEs like VS Code, IntelliJ with AI code assist extensions like cline, copilot, etc.</p>
</li>
<li><p>AI Tools by AI labs:</p>
<ol>
<li><p>Terminal based tool a.k.a cli like Claude Code, Codex, Gemini, etc.</p>
</li>
<li><p>Native Apps</p>
</li>
</ol>
</li>
</ol>
<h2>Agents and Security</h2>
<p>The tools can be used as a REPL where you interact with the AI or you give it permission to run on its own. These are called Agents. They are autonomous to the extent you allow. Most tools are Agents first now.</p>
<p>The common phases of development are Plan and Act. The planning phase allows you to iterate over the execution plan.</p>
<p>However, the act phase requires giving agents permission to implement, and verify the code. Autonomous software with the capability to run commands and execute code is a security nightmare. To review, and approve every step an agent performs slows down the development cycle. It is increasingly difficult to review the large code changes made by agents in every step. Moreover, they may change course if the approach does not work. Practically, it makes more sense to let the agent run autonomously and review the final result rather than approving each individual step.</p>
<p>What happens if the agent runs the wrong commands and deletes files, or worse, crashes the system. During its attempt to debug an issue, an agent may open ports, disable a security setting or misconfigure the system. We cannot rely on the AI agent implementation to safeguard from such risks. Agents should be prevented from making dangerous operations. This is why they should be run inside a sandbox.</p>
<h1>Code Sandboxing</h1>
<p>A <strong>sandbox</strong> is a secure, isolated environment used to execute, test, or analyze code, applications, or programs without affecting the system or surrounding environment.</p>
<p><a href="https://www.browserstack.com/guide/what-is-sandbox">https://www.browserstack.com/guide/what-is-sandbox</a></p>
<p>From <a href="https://kristaps.bsd.lv/devsecflops/">source code sandboxing</a> :</p>
<blockquote>
<p><em>Sandboxing</em>, in this case, is when a developer limits available system resources to a program from within its source code.</p>
</blockquote>
<h2>How Popular CLI tools implement sandboxing</h2>
<h3>Claude Code</h3>
<p>They have a great page worth reading: <a href="https://code.claude.com/docs/en/sandboxing">sandboxing</a></p>
<h3>Codex</h3>
<p>The code which is implemented in Rust, <a href="https://github.com/openai/codex">openai/codex</a>, on a high-level (by platform)</p>
<ul>
<li><p>macOS</p>
<ul>
<li><p>Uses Apple Seatbelt via sandbox-exec (the CLI wraps commands using Seatbelt).</p>
</li>
<li><p>Runs the command inside a mostly read-only jail, exposing a small set of writable roots (cwd, TMPDIR, ~/.codex, etc.).</p>
</li>
<li><p>Outbound network is fully blocked by default (even attempted curl will fail).</p>
</li>
<li><p>Codex detects sandbox denial via aggregated output (e.g., filesystem "Read-only file system" messages) and special signals.</p>
</li>
</ul>
</li>
<li><p>Linux</p>
<ul>
<li><p>There is no built-in OS sandboxing enabled by default in Codex; the README recommends using Docker for deterministic sandboxing.</p>
</li>
<li><p>Codex ships/uses a helper binary (codex-linux-sandbox) that combines Landlock and seccomp. The code serializes the SandboxPolicy to JSON and invokes the helper with flags (see <a href="http://landlock.rs">landlock.rs</a> and create_linux_sandbox_command_args). The helper enforces file-system and syscall restrictions.</p>
</li>
<li><p>The exec path will call spawn_command_under_linux_sandbox when Linux sandboxing is chosen; exec detection also looks for SIGSYS exit codes to infer seccomp denial.</p>
</li>
</ul>
</li>
<li><p>Windows</p>
<ul>
<li><p>Codex integrates with a Windows sandboxing component (codex_windows_sandbox). The core crate exposes sandbox_setup_is_complete and run_elevated_setup that call into that module.</p>
</li>
<li><p>The code supports a WindowsRestrictedToken sandbox type and includes UI flows for enabling sandbox features, elevated setup, and fallbacks if elevation is declined.</p>
</li>
<li><p>Codex also performs world-writable directory scans and shows prompts related to world-writable filesystem protections before enabling agent mode.</p>
</li>
</ul>
</li>
</ul>
<h2>Docker Sandbox for Agents</h2>
<p>Docker has a neat trick to run agents: <a href="https://docs.docker.com/reference/cli/docker/sandbox/">docker/sandbox/</a> which works with the agents. You can get started by <code>docker sandbox run &lt;agent&gt;</code> where  can be claude, gemini, codex and more: <a href="https://hub.docker.com/r/docker/sandbox-templates">docker/sandbox-templates</a></p>
<p><strong>So, why not use Containers everywhere?</strong></p>
<p>Containers are the most widely used sandboxes available today. Running your code inside containers is a solid, straightforward solution. However, containers require a runtime to be installed. While they start quickly, they need images and dependencies downloaded first.</p>
<h1>Securing Agents in the IDE</h1>
<p>Now that we know that CLI tools implement sandboxing and can also be run inside containers, it feels safer to write code with agents.</p>
<p>However, this begs the question: how do agents running within a typical IDE guarantee safety?</p>
<ul>
<li><p>Relying on the human approval when making changes or running commands</p>
</li>
<li><p>Approve or Deny List of commands that can be run</p>
</li>
<li><p>The safety built into the tool/extension itself which may or may not be open-source.</p>
</li>
</ul>
<p>So, you can roll your own solution depending on the isolation level you want:</p>
<ul>
<li><p><strong>Containers</strong> (Docker/Podman) for process isolation</p>
</li>
<li><p><strong>Dedicated user</strong> with minimal privileges</p>
</li>
<li><p><strong>Filesystem restrictions</strong> (read-only root, limited workspace)</p>
</li>
<li><p><strong>Resource limits</strong> (memory, CPU, disk quotas)</p>
</li>
<li><p><strong>Network isolation</strong> (if agent doesn't need network) - Not covered here</p>
</li>
<li><p><strong>Seccomp/AppArmor</strong> for syscall filtering - Not covered here</p>
</li>
</ul>
<p>Here are the steps to implement <em>some</em> of the above:</p>
<h2>1. <strong>Container Based Isolation</strong></h2>
<p><strong>Docker/Podman</strong> provides strong isolation with minimal overhead:</p>
<pre><code class="language-bash"># Run agent with restricted filesystem access
docker run --rm \\
  --read-only \\                    # Root filesystem is read-only
  --tmpfs /tmp:rw,noexec,nosuid \\  # Temp space without execution
  --volume ./agent_workspace:/workspace:rw \\
  --network none \\                  # No network access
  --cap-drop ALL \\                  # Drop all capabilities
  --security-opt no-new-privileges \\
  agent-image
</code></pre>
<p><strong>Benefits</strong>: Process isolation, resource limits (CPU, memory), and network control all in one. Bonus this work on Linux and macOS</p>
<h2>2. <strong>Filesystem Level Restrictions</strong></h2>
<h3>macOS: Sandbox profiles</h3>
<p>For example, Codex uses Seatbelt via on sandbox-exec on macOS. Here’s the step to create a profile</p>
<pre><code class="language-bash">(version 1)
(deny default)
(allow file-read* (subpath "/System/Library"))
(allow file* (subpath "/workspace"))
(allow process-exec (literal "/bin/bash"))
</code></pre>
<p>Apply with: <code>sandbox-exec -f profile.sb your-agent-command</code></p>
<p>More <a href="https://igorstechnoclub.com/sandbox-exec/">https://igorstechnoclub.com/sandbox-exec/</a></p>
<h3>Linux</h3>
<p>There are many Linux tools that can be used, but as you add more steps, it gets closer to a container. Landlock is relatively new and often seen as a superior alternative/complement to traditional syscall filtering mechanisms like seccomp for filesystem control.</p>
<p>chroot + namespaces</p>
<pre><code class="language-bash"># Create isolated environment
mkdir -p /sandbox/{bin,lib,lib64,workspace}
# Copy only necessary binaries
cp /bin/bash /sandbox/bin/
# Copy required libraries (use ldd to find them)
# That can be a bit of work!
# Run agent in chroot jail
sudo chroot /sandbox /bin/bash
</code></pre>
<p><strong>unshare</strong> for namespace isolation</p>
<pre><code class="language-bash"># Create mount, PID, and network namespaces
unshare --mount --pid --net --fork --user \\
  --map-root-user chroot /sandbox /bin/bash
</code></pre>
<p>Caveats</p>
<p>Restricting filesystem access may break tools. For example, <code>uv</code> caches packages on the local system, and if you ask the agent to use <code>uv</code> but (unintentionally) deny access to the <code>.cache</code> directory, it will fail we permission denied error.</p>
<h2>3. <strong>User Level Isolation</strong></h2>
<p>Create a dedicated unprivileged user</p>
<pre><code class="language-bash"># Create restricted user
sudo useradd -m -s /bin/bash -G nogroup aiagent
sudo passwd -l aiagent  # Lock password

# Set up workspace with quotas
sudo mkdir /home/aiagent/workspace
sudo chown aiagent:aiagent /home/aiagent/workspace
sudo chmod 700 /home/aiagent/workspace

# Set disk quota (prevent disk exhaustion)
sudo setquota -u aiagent 1000000 1500000 0 0 /home  # 1GB soft, 1.5GB hard block limit
</code></pre>
<p>Run agent as this user:</p>
<pre><code class="language-bash">sudo -u aiagent python agent.py
</code></pre>
<p>That’s it for now. Happy Coding!</p>
]]></content:encoded></item><item><title><![CDATA[Avoid the Slippery Slope of "AI Slop"]]></title><description><![CDATA[We're entering the era of "AI Slop" that specific brand of code that looks functional at a glance but is actually a tangled mess of deprecated patterns, redundant logic, and architectural debt.
You as]]></description><link>https://code.manas.me/avoid-the-slippery-slope-of-ai-slop</link><guid isPermaLink="true">https://code.manas.me/avoid-the-slippery-slope-of-ai-slop</guid><category><![CDATA[AI]]></category><category><![CDATA[#ai-tools]]></category><category><![CDATA[coding]]></category><category><![CDATA[Python]]></category><category><![CDATA[React]]></category><category><![CDATA[Quality Assurance]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Sat, 27 Dec 2025 10:21:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/78-ya1CzgY8/upload/07e3603782d8f28a3bc0037ce045039f.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We're entering the era of "AI Slop" that specific brand of code that looks functional at a glance but is actually a tangled mess of deprecated patterns, redundant logic, and architectural debt.</p>
<p>You ask an AI agent to build a "simple" React component or a Python FastAPI endpoint, and within seconds, it spits out 200 lines of code. It runs, so you commit it.And then the slide begins.</p>
<p>"AI Slop" is the technical debt per second that accumulates when we prioritize velocity over verification. If left unchecked, your repository becomes a digital wasteland where no human understands the "why" behind the code. Here is how you build a high-tech "Safety Net" to stop the slide.</p>
<h1><strong>The Strategy</strong></h1>
<p>To avoid the slippery slope, you must realize that AI is probabilistic, while quality must be deterministic. You have two choices:</p>
<ul>
<li><p><strong>A. Prompt for Compliance:</strong> Ask the AI nicely to follow style guides.</p>
</li>
<li><p><strong>B. Automated Enforcement:</strong> Use tools that physically block the AI from committing "slop."</p>
</li>
</ul>
<p><strong>The Winner? Strategy B.</strong> Don't trust the AI to always follow the project’s standards. Instead, use tools to <em>enforce</em> them. By setting up rigorous local gates, you can force the AI to refactor its own output until it meets the bar.</p>
<h2><strong>The Safety Net: Pre-commit Arsenal</strong></h2>
<p>To keep the "AI Slop" out of main branch, use a multi-layered suite of <strong>Pre-commit Hooks</strong>. These act as a quality filter that triggers every time code is saved.</p>
<p>We can broadly categories the hooks in three categories, General, Backend, and Frontend. You may add more in the CI/CD pipelines. These are fast tools that can be run locally without slowing the development pace.</p>
<h3><strong>General Purpose: The Gatekeepers</strong></h3>
<p>These tools protect your infrastructure and prevent the most common "slop" side effects.</p>
<ul>
<li><p><strong>Gitleaks:</strong> Scans for hardcoded secrets. AI often "hallucinates" credentials for testing; Gitleaks ensures they never reach the cloud.</p>
</li>
<li><p><strong>Codespell:</strong> Catching typos in documentation and comments that make AI-generated code look unpolished.</p>
</li>
<li><p><strong>Check-YAML/JSON:</strong> Ensures the configuration files hasn't been corrupted by malformed AI output.</p>
</li>
</ul>
<h3><strong>Backend: The Modernist</strong></h3>
<p>For a modern Python backend, we have:</p>
<ul>
<li><p><strong>Ruff</strong> which replaces dozens of legacy tools to lint and format code instantly.</p>
</li>
<li><p><strong>Pyupgrade:</strong> Automatically updates legacy AI code to use modern Python 3.11+ syntax.</p>
</li>
<li><p><strong>Creosote</strong>: finds unused dependencies (dependency bloat)</p>
</li>
<li><p><strong>pip-audit</strong>: finds vulnerable dependencies.</p>
</li>
</ul>
<h3><strong>Frontend: The Cleanup Crew</strong></h3>
<p><strong>For a React/JS stack:</strong></p>
<ul>
<li><p><strong>Biome</strong> is a fast, unified tool for React that prevents the "spaghetti code" found in AI-generated JSX.</p>
</li>
<li><p><strong>Knip</strong> finds “Dead Code”, unused files and exports that AI often leaves behind after a refactor.</p>
</li>
</ul>
<h2><strong>The Regression Shield: Baseline Testing</strong></h2>
<p><strong>Regression Testing</strong> is your real defense against "Functional Slop." AI agents are notorious for "fixing" a bug while unknowingly breaking another core utility.</p>
<h1><strong>AI Quality Gate Workflow</strong></h1>
<ol>
<li><p>Ask the AI to generate the feature.</p>
</li>
<li><p>Ask the AI to write unit tests for that specific feature. This can a sub-agent too.</p>
</li>
<li><p>Run the tests. If they pass, you’ve confirmed the feature works and created a "baseline." If a future AI generation breaks this baseline, you'll know instantly.</p>
</li>
<li><p><strong>Iterative Development: The "Git Safety Valve"</strong> The fastest way to slide into slop is to commit 1,000 lines of AI changes at once. When the code is that massive, human review becomes impossible. Use <strong>Atomic Iterative Commits</strong> to stay in control:</p>
<ol>
<li><p><strong>Commit 1:</strong> Data models and schemas.</p>
</li>
<li><p><strong>Commit 2:</strong> Business logic and API endpoints.</p>
</li>
<li><p><strong>Commit 3:</strong> UI components and styling. <strong>Why?</strong> If the AI goes off the rails during the UI phase, you can <code>git rollback</code> or <code>git revert</code> just that specific step. This keeps your history clean and ensures you maintain "quality" while discarding the "cruft."</p>
</li>
</ol>
</li>
</ol>
<p>In case of frontend, you can have a terminal session running npm tests while code is modified.</p>
<pre><code class="language-mermaid">graph TD

    Start((fa:fa-rocket Start: Feature Request)):::startEnd --&gt; Generate[fa:fa-robot AI Generates Code]
    Generate --&gt; TestGen[fa:fa-vial AI Generates Unit Tests]
    
    subgraph QualityGate [fa:fa-shield-halved Automated Quality Gate]
        direction TB
        RunHooks[fa:fa-terminal Run: Pre-commit Hooks]:::process
        RunTests[fa:fa-microscope Run: Test Suite]:::process
        
        RunHooks --&gt; HookCheck{Hooks Pass?}:::decision
        HookCheck -- "No" --&gt; FixHooks[fa:fa-wrench AI Fixes Linting/Security]:::error
        FixHooks --&gt; RunHooks
        
        HookCheck -- "Yes" --&gt; RunTests
        RunTests --&gt; TestCheck{Tests Pass?}:::decision
        TestCheck -- "No" --&gt; FixTests[fa:fa-bug AI Fixes Logic/Regressions]:::error
        FixTests --&gt; RunTests
    end

    TestGen --&gt; RunHooks
    TestCheck -- "Yes" --&gt; AtomicCommit[fa:fa-eye Human Review: Atomic Commit]:::decision
    
    %% Review Loop
    AtomicCommit -- "Changes Requested" --&gt; Generate

    subgraph GitStrategy [fa:fa-code-branch Iterative Git Valve]
        direction TB
        Step1[fa:fa-database Commit 1: Models/Schemas]:::action
        Step2[fa:fa-gears Commit 2: Logic/Endpoints]:::action
        Step3[fa:fa-desktop Commit 3: UI/Styling]:::action
        
        Step1 --&gt; Step2
        Step2 --&gt; Step3
    end

    AtomicCommit -- "Approved" --&gt; Step1

    Step3 --&gt; Success((fa:fa-check-circle End Feature)):::startEnd
    
    %% The Development Cycle
    Success -. "Next Iteration" .-&gt; Start
</code></pre>
<h1><strong>How to use with AI Agents</strong></h1>
<p>To ensure the highest code quality when working with AI:</p>
<ol>
<li><p><strong>Instruction:</strong> Tell the agent: <em>"Run pre-commit run --all-files before committing.”</em> <strong>Bonus</strong>: “Generate meaningful git commit” Please note that this may consume additional credits.</p>
</li>
<li><p><strong>Context:</strong> Ensure the agent has read the .pre-commit-config.yaml so it understands the rules it must follow (e.g., using Ruff instead of Black).</p>
</li>
<li><p><strong>Resolution:</strong> If a tool fails, provide the error log back to the AI and ask it to fix the specific violations.</p>
</li>
</ol>
<p>"AI Slop" is the natural result of friction-less code generation. By using a deterministic toolchain of pre-commit hooks and a disciplined, iterative Git workflow, you can move at the speed of AI while maintaining the quality of a veteran architect.</p>
<p>Here's a sample pre-commit hook for a React (Vite) project: <a href="https://github.com/rainzoo/income-tax-calculator/blob/main/.pre-commit-config.yaml">.pre-commit-config.yaml</a></p>
<p>Run the pre-commit on all files using <code>pre-commit run --all-files</code> which assumes pre-commit is installed. if you use <code>uv</code> , which is highly recommended, you can do <code>uvx pre-commit</code></p>
<h1>Caveats</h1>
<ul>
<li><p>Static analysis can give false positives. You may have to ignore some failures, especially spell checks. Configure the tool to ignore specific files, e.g. package.json.</p>
</li>
<li><p>Dependency management can be tricky. Development and test packages increase the distribution size if not separated clearly. Each package manager has its own solution to this problem, so be careful when adding dependencies.</p>
</li>
</ul>
<p>Happy coding!</p>
]]></content:encoded></item><item><title><![CDATA[A friendly Fio Job Builder]]></title><description><![CDATA[Introduction
Writing effective FIO (Flexible I/O Tester) jobs can be a complex and error-prone process, especially for developers and system administrators who need to benchmark storage performance across different platforms and I/O engines. The chal...]]></description><link>https://code.manas.me/a-friendly-fio-job-builder</link><guid isPermaLink="true">https://code.manas.me/a-friendly-fio-job-builder</guid><category><![CDATA[fio]]></category><category><![CDATA[Testing]]></category><category><![CDATA[storage]]></category><category><![CDATA[test-automation]]></category><category><![CDATA[tools]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Thu, 04 Dec 2025 16:41:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1764866286006/b86295ed-dcf9-46ce-b3b0-b28a607b3d3f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-introduction">Introduction</h2>
<p>Writing effective FIO (Flexible I/O Tester) jobs can be a complex and error-prone process, especially for developers and system administrators who need to benchmark storage performance across different platforms and I/O engines. The challenges range from understanding the myriad of configuration options to ensuring cross-platform compatibility and avoiding dangerous configurations that could lead to data loss.</p>
<h2 id="heading-the-core-challenges-of-fio-job-configuration">The Core Challenges of FIO Job Configuration</h2>
<h3 id="heading-1-complexity-of-fio-options">1. Complexity of FIO Options</h3>
<p>FIO provides hundreds of configuration options that can be overwhelming for both beginners and experienced users:</p>
<ul>
<li><strong>I/O Type Options</strong>: <code>rw</code>, <code>rwmixread</code>, and other parameters control read/write patterns</li>
<li><strong>Block Size Configuration</strong>: <code>bs</code>, <code>bsrange</code> for controlling I/O unit sizes</li>
<li><strong>I/O Engine Selection</strong>: Different engines like <code>libaio</code>, <code>posixaio</code>, <code>io_uring</code> have platform-specific requirements</li>
<li><strong>Target Specification</strong>: <code>filename</code>, <code>directory</code>, and related options for defining test targets</li>
<li><strong>Threading and Synchronization</strong>: <code>numjobs</code>, <code>group_reporting</code>, <code>stonewall</code> for parallel execution</li>
<li><strong>Verification and Logging</strong>: Extensive options for data integrity checking and performance logging</li>
</ul>
<p>The sheer volume of options makes it difficult to remember syntax, understand interactions between parameters, and ensure proper configuration.</p>
<h3 id="heading-2-platform-specific-compatibility-issues">2. Platform-Specific Compatibility Issues</h3>
<p>FIO jobs must account for significant differences between operating systems:</p>
<ul>
<li><strong>Linux</strong>: Supports advanced engines like <code>libaio</code>, <code>io_uring</code>, <code>splice</code></li>
<li><strong>Windows</strong>: Limited to <code>windowsaio</code> </li>
<li><strong>macOS</strong>: Supports <code>posixaio</code>, <code>mmap</code>, but lacks some Linux-specific engines</li>
</ul>
<p>Using an incompatible I/O engine on a platform can cause job failures or unexpected behavior.</p>
<h3 id="heading-3-dangerous-configuration-risks">3. Dangerous Configuration Risks</h3>
<p>Improper FIO configurations can lead to catastrophic data loss:</p>
<ul>
<li><strong>Raw Device Writes</strong>: Accidentally writing to <code>/dev/sdX</code> or similar block devices destroys data</li>
<li><strong>Missing Targets</strong>: Forgetting to specify <code>filename</code> or <code>directory</code> causes job failures</li>
<li><strong>Engine-Specific Requirements</strong>: Some engines like <code>libaio</code> require <code>direct=1</code> for proper operation</li>
<li><strong>Privilege Issues</strong>: Certain operations require root/sudo access</li>
</ul>
<h3 id="heading-4-validation-and-error-prevention">4. Validation and Error Prevention</h3>
<p>Manual FIO job writing lacks real-time validation:</p>
<ul>
<li>No immediate feedback on configuration errors</li>
<li>Easy to miss critical warnings about data-destructive operations</li>
<li>Difficult to catch platform incompatibilities before execution</li>
<li>No automated checking of option combinations that might conflict</li>
</ul>
<h3 id="heading-5-job-organization-and-management">5. Job Organization and Management</h3>
<p>Managing multiple FIO jobs becomes cumbersome:</p>
<ul>
<li>Difficult to organize global vs. per-job options</li>
<li>No visual interface for comparing different job configurations</li>
<li>Manual editing of <code>.fio</code> files is error-prone and time-consuming</li>
<li>Lack of version control or easy modification tracking</li>
</ul>
<h2 id="heading-how-fiojobbuilder-addresses-these-challenges">How fioJobBuilder Addresses These Challenges</h2>
<p>fioJobBuilder is a configuration tool that helps with these FIO job writing challenges:</p>
<h3 id="heading-1-interactive-visual-interface">1. Interactive Visual Interface</h3>
<p>Instead of manually editing text files, fioJobBuilder provides:</p>
<ul>
<li><strong>Organized Option Categories</strong>: Options grouped by usage type (Block, File, Others) and functional categories</li>
<li><strong>Searchable Documentation Links</strong>: Each option includes direct links to official FIO documentation</li>
<li><strong>Platform-Specific Filtering</strong>: Automatically shows only relevant options for the selected platform</li>
<li><strong>Real-time Preview</strong>: Live generation of the <code>.fio</code> file as you configure options</li>
</ul>
<h3 id="heading-2-cross-platform-compatibility-handling">2. Cross-Platform Compatibility Handling</h3>
<p>The tool intelligently manages platform differences:</p>
<ul>
<li><strong>Platform Toggle</strong>: Easy switching between Linux, Windows, and macOS</li>
<li><strong>Engine Filtering</strong>: Only shows compatible I/O engines for the selected platform</li>
<li><strong>Platform-Specific Validation</strong>: Warns about engine/platform mismatches</li>
<li><strong>Automatic Adjustments</strong>: Handles platform-specific requirements automatically</li>
</ul>
<h3 id="heading-3-comprehensive-validation-system">3. Comprehensive Validation System</h3>
<p>fioJobBuilder includes multiple layers of validation:</p>
<h4 id="heading-static-validation">Static Validation</h4>
<ul>
<li><strong>Target Checking</strong>: Ensures <code>filename</code> or <code>directory</code> is specified</li>
<li><strong>Engine Validation</strong>: Verifies engine compatibility with selected platform</li>
<li><strong>Dangerous Operation Warnings</strong>: Flags raw device writes (<code>/dev/*</code>) in write mode</li>
<li><strong>Engine-Specific Requirements</strong>: Checks for required options like <code>direct=1</code> with <code>libaio</code></li>
</ul>
<h4 id="heading-ai-powered-analysis-optional">AI-Powered Analysis (Optional)</h4>
<ul>
<li><strong>Configuration Scoring</strong>: Rates your job configuration from 0-100</li>
<li><strong>Performance Suggestions</strong>: Provides optimization recommendations</li>
<li><strong>Risk Assessment</strong>: Identifies potential issues before execution</li>
<li><strong>Best Practice Guidance</strong>: Suggests improvements based on FIO expertise</li>
</ul>
<h3 id="heading-4-job-management-features">4. Job Management Features</h3>
<p>The tool provides robust job organization capabilities:</p>
<ul>
<li><strong>Global vs. Per-Job Options</strong>: Clear separation with visual indicators</li>
<li><strong>Multiple Job Support</strong>: Create and manage multiple jobs in one configuration</li>
<li><strong>Job Cloning</strong>: Easy duplication of existing jobs for variation testing</li>
<li><strong>Reset Capabilities</strong>: Quickly clear configurations and start fresh</li>
</ul>
<h3 id="heading-5-safety-features">5. Safety Features</h3>
<p>fioJobBuilder prioritizes preventing dangerous operations:</p>
<ul>
<li><strong>Critical Warning System</strong>: Highlights data-destructive operations in red</li>
<li><strong>Platform Compatibility Checks</strong>: Prevents using incompatible engines</li>
<li><strong>Target Validation</strong>: Ensures jobs have proper targets before execution</li>
<li><strong>Visual Feedback</strong>: Clear color-coded warnings and error indicators</li>
</ul>
<h2 id="heading-practical-examples">Practical Examples</h2>
<h3 id="heading-example-1-creating-a-safe-block-device-test">Example 1: Creating a Safe Block Device Test</h3>
<p><strong>Challenge</strong>: You want to test a block device but need to ensure you don't accidentally overwrite important data.</p>
<p><strong>Solution with fioJobBuilder</strong>:</p>
<ol>
<li>Select Linux platform</li>
<li>Choose Block usage type</li>
<li>Set <code>rw=randwrite</code> for random write testing</li>
<li>Specify target as <code>/dev/nvme0n1</code> (test device)</li>
<li>The validation panel immediately shows: "CRITICAL WARNING - Writing to raw device '/dev/nvme0n1' will destroy data"</li>
<li>You can then confirm this is your intended test device</li>
</ol>
<h3 id="heading-example-2-cross-platform-configuration">Example 2: Cross-Platform Configuration</h3>
<p><strong>Challenge</strong>: You need to create jobs that work on both Linux and Windows.</p>
<p><strong>Solution with fioJobBuilder</strong>:</p>
<ol>
<li>Start with Linux platform, configure using <code>libaio</code> engine</li>
<li>Switch to Windows platform - the tool automatically:<ul>
<li>Changes available engines to Windows-compatible ones</li>
<li>Shows warning about <code>libaio</code> incompatibility</li>
<li>Suggests using <code>windowsaio</code> instead</li>
</ul>
</li>
<li>Adjust configuration for Windows compatibility</li>
<li>Export separate configurations for each platform</li>
</ol>
<h3 id="heading-example-3-complex-multi-job-benchmarking">Example 3: Complex Multi-Job Benchmarking</h3>
<p><strong>Challenge</strong>: You need to run multiple different workloads simultaneously.</p>
<p><strong>Solution with fioJobBuilder</strong>:</p>
<ol>
<li>Create a global configuration with common settings</li>
<li>Add multiple jobs with different <code>rw</code> patterns</li>
<li>Each job inherits global settings but can override specific options</li>
<li>Visual interface shows all jobs clearly organized</li>
<li>Single export generates complete multi-job configuration</li>
</ol>
<h2 id="heading-technical-implementation">Technical Implementation</h2>
<p>fioJobBuilder is built with modern web technologies:</p>
<ul>
<li><strong>React + TypeScript</strong>: Provides responsive UI with type safety</li>
<li><strong>Vite</strong>: Fast development and production builds</li>
<li><strong>Tailwind CSS</strong>: Clean, modern styling</li>
<li><strong>Google Gemini API</strong>: Optional AI analysis integration</li>
</ul>
<p>The architecture includes:</p>
<ul>
<li><strong>State Management</strong>: React hooks for managing complex configuration state</li>
<li><strong>Validation Engine</strong>: Comprehensive rule-based checking system</li>
<li><strong>Platform Detection</strong>: Automatic adjustment of available options</li>
<li><strong>Export System</strong>: Clean <code>.fio</code> file generation with syntax highlighting</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Writing FIO jobs doesn't have to be a daunting, error-prone process. fioJobBuilder addresses the core challenges by providing:</p>
<ol>
<li><strong>Simplified Configuration</strong>: Visual interface instead of manual text editing</li>
<li><strong>Cross-Platform Intelligence</strong>: Automatic handling of platform differences</li>
<li><strong>Comprehensive Validation</strong>: Multiple layers of error checking</li>
<li><strong>Enhanced Safety</strong>: Protection against dangerous operations</li>
<li><strong>Advanced Features</strong>: AI analysis, job management, and organization tools</li>
</ol>
<p>By using fioJobBuilder, developers and system administrators can focus on their benchmarking goals rather than struggling with configuration syntax and compatibility issues. The tool significantly reduces the learning curve for FIO while providing expert-level validation and optimization capabilities.</p>
<h2 id="heading-getting-started">Getting Started</h2>
<p>To try fioJobBuilder:</p>
<pre><code class="lang-bash">git <span class="hljs-built_in">clone</span> https://github.com/rainzoo/fio-job-builder.git
<span class="hljs-built_in">cd</span> fio-job-builder
npm install
npm run dev
</code></pre>
<p>Then open <code>http://localhost:5173</code> in your browser and start building FIO jobs with confidence!</p>
<h2 id="heading-references">References</h2>
<ul>
<li><a target="_blank" href="https://fio.readthedocs.io/">FIO Official Documentation</a></li>
<li><a target="_blank" href="https://github.com/axboe/fio">FIO GitHub Repository</a></li>
<li><a target="_blank" href="https://github.com/rainzoo/fio-job-builder">fioJobBuilder Source Code</a> </li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Arger - Parameters Across Languages]]></title><description><![CDATA[Effective parameter handling is fundamental to building flexible, maintainable automation systems. This guide explores how different languages and tools approach parameter management, using a user account creation scenario as our running example.
The...]]></description><link>https://code.manas.me/arger-parameters-across-languages</link><guid isPermaLink="true">https://code.manas.me/arger-parameters-across-languages</guid><category><![CDATA[Python]]></category><category><![CDATA[Jenkins]]></category><category><![CDATA[Bash]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Pipeline]]></category><category><![CDATA[automation]]></category><category><![CDATA[ci-cd]]></category><category><![CDATA[opal]]></category><category><![CDATA[genai]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Sat, 15 Nov 2025 14:07:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/5otlbgWJlLs/upload/e7b6e19f9de14ef641c100f4bf1bc46a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Effective parameter handling is fundamental to building flexible, maintainable automation systems. This guide explores how different languages and tools approach parameter management, using a user account creation scenario as our running example.</p>
<h2 id="heading-the-use-case">The Use Case</h2>
<p>We'll examine how to handle user account parameters across different contexts:</p>
<ul>
<li><p><strong>username</strong>: The account identifier</p>
</li>
<li><p><strong>address</strong>: Physical or mailing address</p>
</li>
<li><p><strong>phone</strong>: Contact phone number</p>
</li>
<li><p><strong>email</strong>: Email address</p>
</li>
</ul>
<h2 id="heading-python-function-arguments-and-environment-variables">Python: Function Arguments and Environment Variables</h2>
<p>Python offers multiple approaches to parameter handling. The most straightforward method uses function arguments with type hints and default values:</p>
<pre><code class="lang-python"><span class="hljs-comment"># user_manager.py</span>
<span class="hljs-keyword">import</span> sys
<span class="hljs-keyword">import</span> os

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_user</span>(<span class="hljs-params">username: str, email: str, address: str = <span class="hljs-string">""</span>, phone: str = <span class="hljs-string">""</span></span>) -&gt; dict:</span>
    <span class="hljs-string">"""
    Create a user account with the provided parameters.

    Args:
        username: Required username for the account
        email: Required email address
        address: Optional physical address
        phone: Optional phone number

    Returns:
        Dictionary containing user details
    """</span>
    user_data = {
        <span class="hljs-string">"username"</span>: username,
        <span class="hljs-string">"email"</span>: email,
        <span class="hljs-string">"address"</span>: address <span class="hljs-keyword">if</span> address <span class="hljs-keyword">else</span> <span class="hljs-string">"Not provided"</span>,
        <span class="hljs-string">"phone"</span>: phone <span class="hljs-keyword">if</span> phone <span class="hljs-keyword">else</span> <span class="hljs-string">"Not provided"</span>
    }

    print(<span class="hljs-string">f"Creating user account:"</span>)
    <span class="hljs-keyword">for</span> key, value <span class="hljs-keyword">in</span> user_data.items():
        print(<span class="hljs-string">f"  <span class="hljs-subst">{key}</span>: <span class="hljs-subst">{value}</span>"</span>)

    <span class="hljs-keyword">return</span> user_data

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    <span class="hljs-comment"># Method 1: Command-line arguments</span>
    <span class="hljs-keyword">if</span> len(sys.argv) &gt;= <span class="hljs-number">3</span>:
        username = sys.argv[<span class="hljs-number">1</span>]
        email = sys.argv[<span class="hljs-number">2</span>]
        address = sys.argv[<span class="hljs-number">3</span>] <span class="hljs-keyword">if</span> len(sys.argv) &gt; <span class="hljs-number">3</span> <span class="hljs-keyword">else</span> <span class="hljs-string">""</span>
        phone = sys.argv[<span class="hljs-number">4</span>] <span class="hljs-keyword">if</span> len(sys.argv) &gt; <span class="hljs-number">4</span> <span class="hljs-keyword">else</span> <span class="hljs-string">""</span>
    <span class="hljs-keyword">else</span>:
        <span class="hljs-comment"># Method 2: Environment variables as fallback</span>
        username = os.getenv(<span class="hljs-string">"USERNAME"</span>, <span class="hljs-string">"default_user"</span>)
        email = os.getenv(<span class="hljs-string">"EMAIL"</span>, <span class="hljs-string">"user@example.com"</span>)
        address = os.getenv(<span class="hljs-string">"ADDRESS"</span>, <span class="hljs-string">""</span>)
        phone = os.getenv(<span class="hljs-string">"PHONE"</span>, <span class="hljs-string">""</span>)

    create_user(username, email, address, phone)

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<p>Python's flexibility allows you to combine multiple parameter sources, with command-line arguments taking precedence over environment variables.</p>
<p>The standard library has <a target="_blank" href="https://docs.python.org/3/library/argparse.html">argparse</a>, and whole page for <a target="_blank" href="https://docs.python.org/3/library/optparse.html#choosing-an-argument-parser">choosing an argument parser</a>. There are excellent packages to create CLIs like <a target="_blank" href="https://typer.tiangolo.com/">typer</a> and <a target="_blank" href="https://click.palletsprojects.com/en/stable/">click</a>.</p>
<h2 id="heading-bash-positional-parameters-and-named-variables">Bash: Positional Parameters and Named Variables</h2>
<p>Bash scripts use positional parameters and can validate their presence before execution:</p>
<pre><code class="lang-bash"><span class="hljs-meta">#!/bin/bash</span>
<span class="hljs-comment"># create_user.sh</span>

<span class="hljs-comment"># Function to create user with parameters</span>
<span class="hljs-function"><span class="hljs-title">create_user</span></span>() {
    <span class="hljs-built_in">local</span> username=<span class="hljs-variable">$1</span>
    <span class="hljs-built_in">local</span> email=<span class="hljs-variable">$2</span>
    <span class="hljs-built_in">local</span> address=<span class="hljs-variable">${3:-"Not provided"}</span>
    <span class="hljs-built_in">local</span> phone=<span class="hljs-variable">${4:-"Not provided"}</span>

    <span class="hljs-built_in">echo</span> <span class="hljs-string">"Creating user account:"</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"  username: <span class="hljs-variable">$username</span>"</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"  email: <span class="hljs-variable">$email</span>"</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"  address: <span class="hljs-variable">$address</span>"</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"  phone: <span class="hljs-variable">$phone</span>"</span>

    <span class="hljs-comment"># Simulate user creation</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"User account created successfully"</span>
}

<span class="hljs-comment"># Validate required parameters</span>
<span class="hljs-keyword">if</span> [ <span class="hljs-variable">$#</span> -lt 2 ]; <span class="hljs-keyword">then</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"Error: Missing required parameters"</span>
    <span class="hljs-built_in">echo</span> <span class="hljs-string">"Usage: <span class="hljs-variable">$0</span> &lt;username&gt; &lt;email&gt; [address] [phone]"</span>
    <span class="hljs-built_in">exit</span> 1
<span class="hljs-keyword">fi</span>

<span class="hljs-comment"># Extract parameters</span>
USERNAME=<span class="hljs-variable">$1</span>
EMAIL=<span class="hljs-variable">$2</span>
ADDRESS=<span class="hljs-variable">${3:-""}</span>
PHONE=<span class="hljs-variable">${4:-""}</span>

<span class="hljs-comment"># Call the function</span>
create_user <span class="hljs-string">"<span class="hljs-variable">$USERNAME</span>"</span> <span class="hljs-string">"<span class="hljs-variable">$EMAIL</span>"</span> <span class="hljs-string">"<span class="hljs-variable">$ADDRESS</span>"</span> <span class="hljs-string">"<span class="hljs-variable">$PHONE</span>"</span>
</code></pre>
<p>The <code>${variable:-default}</code> syntax provides a clean way to specify default values for optional parameters.</p>
<h2 id="heading-dockerfile-build-arguments-and-environment-variables">Dockerfile: Build Arguments and Environment Variables</h2>
<p>Dockerfiles distinguish between build-time arguments (ARG) and runtime environment variables (ENV):</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Dockerfile</span>
<span class="hljs-keyword">FROM</span> python:<span class="hljs-number">3.9</span>-slim

<span class="hljs-comment"># Build-time arguments with defaults</span>
<span class="hljs-keyword">ARG</span> APP_VERSION=<span class="hljs-number">1.0</span>.<span class="hljs-number">0</span>
<span class="hljs-keyword">ARG</span> PYTHON_ENV=production

<span class="hljs-comment"># Set environment variables for runtime</span>
<span class="hljs-keyword">ENV</span> USERNAME=<span class="hljs-string">""</span>
<span class="hljs-keyword">ENV</span> EMAIL=<span class="hljs-string">""</span>
<span class="hljs-keyword">ENV</span> ADDRESS=<span class="hljs-string">""</span>
<span class="hljs-keyword">ENV</span> PHONE=<span class="hljs-string">""</span>

<span class="hljs-comment"># Create application directory</span>
<span class="hljs-keyword">WORKDIR</span><span class="bash"> /app</span>

<span class="hljs-comment"># Copy application files</span>
<span class="hljs-keyword">COPY</span><span class="bash"> user_manager.py .</span>
<span class="hljs-keyword">COPY</span><span class="bash"> create_user.sh .</span>

<span class="hljs-comment"># Make bash script executable</span>
<span class="hljs-keyword">RUN</span><span class="bash"> chmod +x create_user.sh</span>

<span class="hljs-comment"># Install any required packages</span>
<span class="hljs-keyword">RUN</span><span class="bash"> pip install --no-cache-dir --upgrade pip</span>

<span class="hljs-comment"># Default command</span>
<span class="hljs-keyword">CMD</span><span class="bash"> [<span class="hljs-string">"python"</span>, <span class="hljs-string">"user_manager.py"</span>]</span>
</code></pre>
<p>You can override these at build time:</p>
<pre><code class="lang-bash">docker build --build-arg APP_VERSION=2.0.0 -t user-manager .
</code></pre>
<p>And at runtime:</p>
<pre><code class="lang-bash">docker run -e USERNAME=<span class="hljs-string">"jdoe"</span> -e EMAIL=<span class="hljs-string">"jdoe@example.com"</span> user-manager
</code></pre>
<h2 id="heading-jenkinsfile-declarative-pipeline-parameters">Jenkinsfile: Declarative Pipeline Parameters</h2>
<p>Jenkins pipelines provide a structured approach to parameter handling through the parameters directive:</p>
<pre><code class="lang-plaintext">pipeline {
    agent any

    parameters {
        string(
            name: 'USERNAME',
            defaultValue: 'testuser',
            description: 'Username for the account',
            trim: true
        )
        string(
            name: 'EMAIL',
            defaultValue: 'test@example.com',
            description: 'Email address',
            trim: true
        )
        string(
            name: 'ADDRESS',
            defaultValue: '',
            description: 'Physical address (optional)',
            trim: true
        )
        string(
            name: 'PHONE',
            defaultValue: '',
            description: 'Phone number (optional)',
            trim: true
        )
        choice(
            name: 'EXECUTION_MODE',
            choices: ['python', 'bash', 'docker'],
            description: 'Which implementation to execute'
        )
    }

    environment {
        SCRIPT_DIR = "${WORKSPACE}"
    }

    stages {
        stage('Validate Parameters') {
            steps {
                script {
                    echo "Validating parameters..."

                    if (params.USERNAME.isEmpty()) {
                        error("USERNAME parameter is required")
                    }

                    if (params.EMAIL.isEmpty()) {
                        error("EMAIL parameter is required")
                    }

                    // Basic email validation
                    if (!params.EMAIL.contains('@')) {
                        error("EMAIL must be a valid email address")
                    }

                    echo "Parameter validation passed"
                }
            }
        }

        stage('Display Parameters') {
            steps {
                echo "Received parameters:"
                echo "  Username: ${params.USERNAME}"
                echo "  Email: ${params.EMAIL}"
                echo "  Address: ${params.ADDRESS ?: 'Not provided'}"
                echo "  Phone: ${params.PHONE ?: 'Not provided'}"
                echo "  Execution Mode: ${params.EXECUTION_MODE}"
            }
        }

        stage('Execute Python Script') {
            when {
                expression { params.EXECUTION_MODE == 'python' }
            }
            steps {
                script {
                    echo "Executing Python implementation..."
                    sh """
                        python3 user_manager.py \
                            '${params.USERNAME}' \
                            '${params.EMAIL}' \
                            '${params.ADDRESS}' \
                            '${params.PHONE}'
                    """
                }
            }
        }

        stage('Execute Bash Script') {
            when {
                expression { params.EXECUTION_MODE == 'bash' }
            }
            steps {
                script {
                    echo "Executing Bash implementation..."
                    sh """
                        chmod +x create_user.sh
                        ./create_user.sh \
                            '${params.USERNAME}' \
                            '${params.EMAIL}' \
                            '${params.ADDRESS}' \
                            '${params.PHONE}'
                    """
                }
            }
        }

        stage('Execute Docker Container') {
            when {
                expression { params.EXECUTION_MODE == 'docker' }
            }
            steps {
                script {
                    echo "Building and executing Docker implementation..."
                    sh """
                        docker build -t user-manager:latest .
                        docker run --rm \
                            -e USERNAME='${params.USERNAME}' \
                            -e EMAIL='${params.EMAIL}' \
                            -e ADDRESS='${params.ADDRESS}' \
                            -e PHONE='${params.PHONE}' \
                            user-manager:latest
                    """
                }
            }
        }

        stage('Report Results') {
            steps {
                script {
                    echo "User account creation process completed"
                    echo "Summary:"
                    echo "  Method used: ${params.EXECUTION_MODE}"
                    echo "  Account created for: ${params.USERNAME}"
                }
            }
        }
    }

    post {
        success {
            echo "Pipeline completed successfully"
        }
        failure {
            echo "Pipeline failed - check logs for details"
        }
        always {
            cleanWs()
        }
    }
}
</code></pre>
<h2 id="heading-takeaways">Takeaways</h2>
<p>Each language and tool offers distinct advantages for parameter handling:</p>
<p><strong>Python</strong> excels at combining multiple parameter sources with clear type hints and validation. The language's flexibility makes it ideal for complex parameter processing logic.</p>
<p><strong>Bash</strong> provides straightforward positional parameters with simple default value syntax. It's particularly effective for system-level scripting where parameters flow directly from command invocation.</p>
<p><strong>Dockerfile</strong> separates build-time configuration (ARG) from runtime configuration (ENV), enabling flexible container deployment strategies across different environments.</p>
<p><strong>Jenkins</strong> offers a declarative, UI-friendly approach to parameters with built-in validation and type safety. The parameters directive creates an intuitive interface for operators while maintaining programmatic access in pipeline code.</p>
<h2 id="heading-best-practices">Best Practices</h2>
<ol>
<li><p><strong>Always validate required parameters</strong> before processing begins</p>
</li>
<li><p><strong>Provide sensible defaults</strong> for optional parameters</p>
</li>
<li><p><strong>Document parameter purposes</strong> through comments or descriptions</p>
</li>
<li><p><strong>Use consistent naming conventions</strong> across all implementations</p>
</li>
<li><p><strong>Escape parameters properly</strong> when passing between systems to prevent injection vulnerabilities</p>
</li>
<li><p><strong>Consider parameter sensitivity</strong> and avoid logging credentials or personal information</p>
</li>
</ol>
<h1 id="heading-arger">Arger</h1>
<p>By understanding how each tool handles parameters, you can build robust automation workflows that adapt to different execution contexts while maintaining consistency and reliability.</p>
<p>However, it can be a pain to write the boilerplate code, especially when you change the parameters. Updating the same across the entire pipeline can get boring quick. So, I created an app to generate the code using Opal.</p>
<p>It takes two inputs:</p>
<ol>
<li><p>Parameter name and type</p>
</li>
<li><p>Languages to generate the code, like Python, Bash, Jenkinsfile, and Dockerfile.</p>
</li>
</ol>
<p>Then, it will generate the code required to handle these parameters. Here’s the link:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://opal.google/?flow=drive:/10OvX-kIAz2GG55UCsJFaQEQWl_aqIASv&amp;shared&amp;mode=app">https://opal.google/?flow=drive:/10OvX-kIAz2GG55UCsJFaQEQWl_aqIASv&amp;shared&amp;mode=app</a></div>
<p> </p>
<blockquote>
<p>Arger</p>
<p>"Arger" can refer to the German word "Ärger," which means <strong><mark>"annoyance," "irritation," or "trouble"</mark></strong>. It can also refer to the German verb "ärgern," which means "to annoy," "to irritate," or "to anger". Less commonly, "Arger" might be a surname or a personal name of various origins.</p>
</blockquote>
]]></content:encoded></item><item><title><![CDATA[FauxFS A Bug-Infested File System for Learning]]></title><description><![CDATA[FauxFS: A Bug-Infested File System for Learning
Have you ever wanted to learn how to debug complex software systems in a safe and controlled environment? Introducing FauxFS, a mock file system written in Python, designed specifically for educational ...]]></description><link>https://code.manas.me/fauxfs-a-bug-infested-file-system-for-learning</link><guid isPermaLink="true">https://code.manas.me/fauxfs-a-bug-infested-file-system-for-learning</guid><category><![CDATA[Mocking]]></category><category><![CDATA[file system]]></category><category><![CDATA[Testing]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Fri, 17 Oct 2025 16:33:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Ctul0jdYjv0/upload/84641e257c8b312daf607c1f3c73fd18.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-fauxfs-a-bug-infested-file-system-for-learning">FauxFS: A Bug-Infested File System for Learning</h1>
<p>Have you ever wanted to learn how to debug complex software systems in a safe and controlled environment? Introducing FauxFS, a mock file system written in Python, designed specifically for educational purposes. But here's the twist: FauxFS is riddled with intentional bugs, providing a unique and challenging learning experience for aspiring software engineers and seasoned developers alike.</p>
<h2 id="heading-what-is-fauxfs">What is FauxFS?</h2>
<p>FauxFS is an in-memory mock file system that simulates the behavior of a real file system. It comes with basic features you'd expect:</p>
<ul>
<li><p><strong>Files and Directories:</strong> Create, delete, and manage files and directories.</p>
</li>
<li><p><strong>Permissions:</strong> A Unix-like permission system with owner, group, and other read/write/execute permissions.</p>
</li>
<li><p><strong>Interactive Shell:</strong> A command-line interface to interact with the file system using familiar commands like <code>ls</code>, <code>cd</code>, <code>mkdir</code>, <code>rm</code>, and more.</p>
</li>
</ul>
<p>You can create files, write content to them, organize them into directories, and manage their permissions, just like you would on a regular file system.</p>
<h2 id="heading-the-twist-intentional-bugs">The Twist: Intentional Bugs</h2>
<p>What sets FauxFS apart is its collection of <strong>22 intentional bugs</strong>. These aren't accidental mistakes; they are carefully crafted flaws designed to mimic real-world software defects. The bugs are categorized into three difficulty levels:</p>
<ul>
<li><p><strong>Beginner:</strong> Simple issues like off-by-one errors and case sensitivity problems.</p>
</li>
<li><p><strong>Intermediate:</strong> More complex bugs like race conditions and path traversal vulnerabilities.</p>
</li>
<li><p><strong>Advanced:</strong> Deeply embedded issues such as deadlocks and integer overflows.</p>
</li>
</ul>
<p>The goal isn't to have a perfect file system but to provide a playground for developers to hunt for and understand these bugs.</p>
<h2 id="heading-why-learn-with-bugs">Why Learn with Bugs?</h2>
<p>In the real world, software has bugs. Learning to find, understand, and fix them is a critical skill for any software engineer. FauxFS provides a safe environment to develop these skills without the risk of breaking a production system. By working with FauxFS, you will learn to:</p>
<ul>
<li><p><strong>Debug Filesystem Operations:</strong> Understand the common pitfalls in file system design and implementation.</p>
</li>
<li><p><strong>Identify Memory Issues:</strong> Learn to spot memory leaks and buffer overflows.</p>
</li>
<li><p><strong>Recognize Concurrency Problems:</strong> Get hands-on experience with race conditions and deadlocks.</p>
</li>
<li><p><strong>Understand Security Vulnerabilities:</strong> Discover how permission systems can be bypassed.</p>
</li>
<li><p><strong>Practice Systematic Debugging:</strong> Use tests, logging, and other tools to systematically hunt for bugs.</p>
</li>
</ul>
<h2 id="heading-getting-started">Getting Started</h2>
<p>Getting started with FauxFS is easy. All you need is Python 3.13 or higher.</p>
<ol>
<li><p>Clone the repository:</p>
<pre><code class="lang-bash"> git <span class="hljs-built_in">clone</span> https://github.com/rainzoo/fauxfs.git
 <span class="hljs-built_in">cd</span> fauxfs
</code></pre>
</li>
<li><p>Build and run interactive container</p>
</li>
</ol>
<pre><code class="lang-bash">docker build -t fauxfs:interactive .
docker run -it --rm fauxfs:interactive
</code></pre>
<p>Once you're in the FauxFS shell, you can start exploring the file system and hunting for bugs. Try running the test suite to see which tests fail, or experiment with edge cases to uncover hidden issues.</p>
<p>FauxFS is a learning tool that turns debugging into a fun and educational challenge. Whether you're a student learning about operating systems or a developer looking to sharpen your debugging skills, FauxFS offers a unique and valuable experience.</p>
<p>So, are you ready to go bug hunting? Clone the FauxFS repository and start exploring today! Refer to the docs directory for more instructions:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/rainzoo/fauxfs.git">https://github.com/rainzoo/fauxfs.git</a></div>
]]></content:encoded></item><item><title><![CDATA[Know Your Limits]]></title><description><![CDATA[Introduction
Have you ever found yourself debugging a system issue and wished you had a quick way to view all the important OS configuration limits and system information in one place?
Meet Limits: an elegant Python terminal application that provides...]]></description><link>https://code.manas.me/know-your-limits</link><guid isPermaLink="true">https://code.manas.me/know-your-limits</guid><category><![CDATA[Python]]></category><category><![CDATA[operating system]]></category><category><![CDATA[filesystem]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Sat, 28 Jun 2025 19:10:23 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/CGWK6k2RduY/upload/63e0b4bc065f7f9d3ae0e7c32e3cfef7.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>Have you ever found yourself debugging a system issue and wished you had a quick way to view all the important OS configuration limits and system information in one place?</p>
<p>Meet <strong>Limits</strong>: an elegant Python terminal application that provides exactly that functionality with a clean, interactive interface. <strong>Limits</strong> is a lightweight Python terminal application built with the <a target="_blank" href="https://textual.textualize.io/">Textual</a> framework that displays comprehensive OS configuration and limits information. It's designed to be a debugging companion for developers and system administrators who need quick access to system resource information.</p>
<h1 id="heading-getting-started">Getting Started</h1>
<p><strong>Ensure you have Python 3.13+ and uv installed</strong></p>
<ul>
<li><p><strong>Clone the repository from</strong> <a target="_blank" href="https://github.com/rainzoo/limits">https://github.com/rainzoo/limits</a></p>
</li>
<li><p><strong>Run</strong>: <code>uv run limits.py</code> (thanks to PEP 723)</p>
</li>
<li><p><strong>Navigate</strong> with arrow keys, press <code>r</code> to refresh, <code>q</code> to quit</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751136627137/1eadc9ab-b1f3-42a2-b498-485e7a4ff553.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-basic-system-info">🖥️ <strong>Basic System Info</strong></h2>
<p>The application displays a wide range of system metrics organized into logical sections:</p>
<ul>
<li><p><strong>CPU Information</strong>: Physical and logical core counts</p>
</li>
<li><p><strong>Memory Information</strong>: Total RAM, available memory, and swap space</p>
</li>
<li><p><strong>Process Resource Limits</strong>: File descriptors, stack size, process limits, virtual memory, and CPU time constraints</p>
</li>
<li><p><strong>Filesystem Limits</strong>: Maximum filename and path lengths</p>
</li>
<li><p><strong>Mounted Filesystems</strong>: Disk usage and inode information for all mounted drives</p>
</li>
</ul>
<h2 id="heading-alternative-usage">⚡ Alternative <strong>Usage</strong></h2>
<p>The application can be run in multiple ways:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># With dependency management</span>
uv sync
uv run limits.py

<span class="hljs-comment"># Debug mode (with textual-dev)</span>
textual run limits.py

<span class="hljs-comment"># Docker </span>
docker build -f Dockerfile -t limits:latest .
docker run -it limits:latest
</code></pre>
<h2 id="heading-dependencies"><strong>Dependencies</strong></h2>
<p>Dependencies are listed in <code>pyproject.toml</code>:</p>
<ul>
<li><p><a target="_blank" href="https://textual.textualize.io/"><strong>Textual</strong></a>: Powers the terminal user interface</p>
<ul>
<li><p>A clean, tabular display with zebra striping for easy reading</p>
</li>
<li><p>Interactive navigation with cursor support</p>
</li>
<li><p>Real-time refresh capability</p>
</li>
</ul>
</li>
<li><p><a target="_blank" href="https://psutil.readthedocs.io/"><strong>psutil</strong></a>: Cross-platform system and process utilities</p>
</li>
<li><p><a target="_blank" href="https://humanize.readthedocs.io/"><strong>humanize</strong></a>: Makes numbers and dates more human-readable</p>
</li>
</ul>
<p>The application intelligently handles platform differences, providing detailed resource limit information on POSIX systems (Linux, macOS) while gracefully degrading on Windows.</p>
<p>The <code>get_os_info(</code>) gathers CPU topology and capabilities, Memory hierarchy (RAM, swap), Process resource constraints, Filesystem characteristics, and Storage device information.</p>
<p>This is an interesting demo whether you're a system administrator looking for a quick diagnostic tool, a developer debugging resource issues, or a Python enthusiast interested in terminal UI development.</p>
<h1 id="heading-deep-dive">Deep Dive</h1>
<p>Understating these limits is vital for building robust and performant applications.</p>
<h2 id="heading-1-cpu-information">1. CPU Information</h2>
<h3 id="heading-physical-amp-logical-cores">Physical &amp; Logical Cores</h3>
<ul>
<li><p>Concurrency: The number of cores determines the true level of parallelism an application can achieve. For CPU-bound tasks, this number is critical for tuning thread pools or multiprocessing strategies to maximize performance without causing excessive context switching.</p>
</li>
<li><p>Performance Tuning: Knowing the core count helps developers design efficient parallel algorithms and decide on the optimal number of worker processes or threads for services like web servers or data processing jobs.</p>
</li>
</ul>
<h2 id="heading-2-process-resource-limits">2. Process Resource Limits</h2>
<p>These are per-process limits, often configured at the OS level to ensure system stability by preventing any single process from consuming all available resources.</p>
<h3 id="heading-max-open-files">Max Open Files</h3>
<p>This is one of the most common limits hit by network services and database applications. Servers that handle many simultaneous connections (e.g., web servers, message queues) or applications that access many files can easily exhaust this limit, leading to "Too many open files" errors. Developers must monitor this and design their code to manage file descriptors efficiently (e.g., using connection pools).</p>
<h3 id="heading-max-processes">Max Processes</h3>
<p>This limit affects applications that use a multiprocess architecture (e.g., preforking web servers like older versions of Apache or Gunicorn). Exceeding the user's process limit will prevent the application from scaling out by creating new child processes, leading to service degradation.</p>
<h3 id="heading-stack-size">Stack Size</h3>
<p>This defines the amount of memory allocated for a thread's function call stack. Applications with deep recursion or large stack-allocated variables can exceed this limit, causing a stack overflow and an immediate crash. It's a critical consideration for system programmers writing recursive algorithms or complex function call chains.</p>
<h3 id="heading-virtual-memory-address-space">Virtual Memory (Address Space)</h3>
<p>This limits the total virtual memory a process can request. For memory-intensive applications like in-memory databases, caches, or scientific computing tools, this limit can be a bottleneck. Hitting it can cause allocation failures, even if physical RAM is available.</p>
<h3 id="heading-cpu-time-limit">CPU Time Limit</h3>
<p>This is a safeguard that kills a process after it has consumed a certain amount of CPU time. While less common in general app development, it's important in multi-user or high-performance computing (HPC) environments to prevent runaway processes from monopolizing CPU resources.</p>
<h2 id="heading-3-filesystem-and-storage-limits">3. Filesystem and Storage Limits</h2>
<h3 id="heading-max-filename-amp-path-length">Max Filename &amp; Path Length</h3>
<p>Applications that create or manage user-defined file structures must respect these limits to ensure cross-platform compatibility and prevent <code>ENOENT</code> (No such file or directory) or <code>ENAMETOOLONG</code> errors. This is especially important for applications that generate nested directory structures or long, descriptive filenames.</p>
<h3 id="heading-disk-amp-inode-usage">Disk &amp; Inode Usage</h3>
<ul>
<li><p><strong>Disk Space</strong> : Running out of disk space is a common cause of application failure. Applications that write logs, temporary files, or store data must have error handling for disk-full scenarios.</p>
</li>
<li><p><strong>Inodes</strong> : An inode is a data structure that stores information about a file. It's possible to run out of inodes even if disk space is available, especially on systems with a large number of very small files (e.g., mail servers, caches). Applications that create many small files must be aware of this potential limit.</p>
</li>
</ul>
<h1 id="heading-curious-case-of-container-limits">Curious Case of Container Limits</h1>
<p>In containers, these are among the most important limits:</p>
<ul>
<li><p>CPU Limits (Shares, Quota, Period) These control how much CPU time a container gets. Shares provide a relative weight, while quota/period provide a hard cap (e.g., "use 2 CPU cores worth of time every 100ms"). Prevents a single "noisy neighbor" container from starving others of CPU. It's fundamental for multi-tenancy and ensuring predictable performance, but setting limits too low can artificially throttle an application that needs to burst.</p>
</li>
<li><p>Memory Limit (<code>memory.limit_in_bytes</code>) The absolute maximum amount of memory a container can use. This is the most critical limit for container stability. If a container's memory usage exceeds this limit, the kernel's Out-Of-Memory (OOM) killer will immediately terminate a process inside it (often the main application), causing the container to crash. Application developers must be acutely aware of their memory footprint relative to this limit.</p>
</li>
</ul>
<p>You run the app with the following options to limit cpu or memory using the <code>Dockerfile</code> :</p>
<ul>
<li><code>docker run --cpus 4 -it limits:latest</code> or <code>docker run --memory 4g -it limits:latest</code></li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">🛑</div>
<div data-node-type="callout-text">Information looks inaccurate! It displays the <strong>same</strong> info every time!</div>
</div>

<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751136702036/91cf024d-8f57-4bfc-85d2-05032af11d33.png" alt class="image--center mx-auto" /></p>
<p>Why? The data that is displayed is from the <strong>host</strong> system. In my case, I have assigned 8 Gb memory and 8 CPUs to my docker engine. This is not what the container is using.</p>
<p>What is the way to fix this? This is left as an exercise for the reader.</p>
]]></content:encoded></item><item><title><![CDATA[Cloud native storage with Rook]]></title><description><![CDATA[Introduction
This is part of the Distributed Storage series. We assume you have a Kubernetes cluster running as described in Kubernetes With Microk8s. Using the same virtual machines, we have deployed a Ceph cluster and enabled different types of sto...]]></description><link>https://code.manas.me/cloud-native-storage-with-rook</link><guid isPermaLink="true">https://code.manas.me/cloud-native-storage-with-rook</guid><category><![CDATA[Rook]]></category><category><![CDATA[storage]]></category><category><![CDATA[cloudnative]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[ceph]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[object storage]]></category><category><![CDATA[file system]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Sat, 21 Jun 2025 12:58:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750430050662/60fd54ce-552e-48d2-980c-c4227f51de3e.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>This is part of the <a target="_blank" href="https://code.manas.me/series/distributed-storage">Distributed Storage</a> series. We assume you have a Kubernetes cluster running as described in <a target="_blank" href="https://code.manas.me/kubernetes-with-microk8s">Kubernetes With Microk8s</a>. Using the same virtual machines, we have deployed a <a target="_blank" href="https://code.manas.me/microceph-is-the-easy-way-to-ceph">Ceph cluster</a> and enabled different types of storage: file, object, and block (default). This ensures that we have covered the <a target="_blank" href="https://rook.io/docs/rook/latest-release/Getting-Started/Prerequisites/prerequisites/">prerequisites</a>.</p>
<p><a target="_blank" href="https://rook.io/">Rook</a> is an open source <strong>cloud-native storage orchestrator</strong>, providing the platform, framework, and support for Ceph storage to natively integrate with cloud-native environments. The storage architecture is well <a target="_blank" href="https://rook.io/docs/rook/latest-release/Getting-Started/storage-architecture/">documented</a>. We focus on getting it up and running.</p>
<h1 id="heading-enable-rook">Enable Rook</h1>
<p>First, we enable rook on the k8s cluster</p>
<pre><code class="lang-bash">manas@manas-s01:~$ sudo microk8s <span class="hljs-built_in">enable</span> rook-ceph
Infer repository core <span class="hljs-keyword">for</span> addon rook-ceph
Add Rook Helm repository &lt;https://charts.rook.io/release&gt;
<span class="hljs-string">"rook-release"</span> has been added to your repositories
Hang tight <span class="hljs-keyword">while</span> we grab the latest from your chart repositories...
...Successfully got an update from the <span class="hljs-string">"rook-release"</span> chart repository
Update Complete. ⎈Happy Helming!⎈
Install Rook version v1.11.9
NAME: rook-ceph
LAST DEPLOYED: Fri Jun 13 16:01:21 2025
NAMESPACE: rook-ceph
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Rook Operator has been installed. Check its status by running:
  kubectl --namespace rook-ceph get pods -l <span class="hljs-string">"app=rook-ceph-operator"</span>
...
</code></pre>
<p>Check Rook Status</p>
<pre><code class="lang-bash">manas@manas-s01:~$ microk8s kubectl --namespace rook-ceph get pods -l <span class="hljs-string">"app=rook-ceph-operator"</span>
NAME                                  READY   STATUS    RESTARTS   AGE
rook-ceph-operator-684bbd569f-82dv9   1/1     Running   0          30m
</code></pre>
<h1 id="heading-connect-ceph-and-k8s-cluster">Connect Ceph and k8s cluster</h1>
<pre><code class="lang-bash">manas@manas-s01:~$ sudo microk8s connect-external-ceph
[sudo] password <span class="hljs-keyword">for</span> manas: 
Looking <span class="hljs-keyword">for</span> MicroCeph on the host
Detected existing MicroCeph installation
Attempting to connect to Ceph cluster
Successfully connected to e43d58a8-deb0-43d9-b7ef-d6159f114c02 (192.168.148.134:0/1580887393)
Creating pool microk8s-rbd0 <span class="hljs-keyword">in</span> Ceph cluster
Configuring pool microk8s-rbd0 <span class="hljs-keyword">for</span> RBD
Successfully configured pool microk8s-rbd0 <span class="hljs-keyword">for</span> RBD
Creating namespace rook-ceph-external
namespace/rook-ceph-external created
Configuring Ceph CSI secrets
Successfully configured Ceph CSI secrets
Importing Ceph CSI secrets into MicroK8s
secret/rook-ceph-mon created
configmap/rook-ceph-mon-endpoints created
secret/rook-csi-rbd-node created
secret/rook-csi-rbd-provisioner created
storageclass.storage.k8s.io/ceph-rbd created
Importing external Ceph cluster
W0613 16:32:46.334927  129114 warnings.go:70] unknown field <span class="hljs-string">"spec.upgradeOSDRequiresHealthyPGs"</span>
NAME: rook-ceph-external
LAST DEPLOYED: Fri Jun 13 16:32:45 2025
NAMESPACE: rook-ceph-external
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Ceph Cluster has been installed. Check its status by running:
  kubectl --namespace rook-ceph-external get cephcluster

Visit &lt;https://rook.io/docs/rook/latest/CRDs/Cluster/ceph-cluster-crd/&gt; <span class="hljs-keyword">for</span> more information about the Ceph CRD.

Important Notes:
- You can only deploy a single cluster per namespace
- If you wish to delete this cluster and start fresh, you will also have to wipe the OSD disks using `sfdisk`

=================================================

Successfully imported external Ceph cluster. You can now use the following storageclass
to provision PersistentVolumes using Ceph CSI:

NAME       PROVISIONER                  RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
ceph-rbd   rook-ceph.rbd.csi.ceph.com   Delete          Immediate           <span class="hljs-literal">true</span>                   2s
</code></pre>
<p>Check Ceph cluster status</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Initial </span>
manas@manas-s01:~$ microk8s kubectl --namespace rook-ceph-external get cephcluster
NAME                 DATADIRHOSTPATH   MONCOUNT   AGE   PHASE   MESSAGE   HEALTH   EXTERNAL   FSID
rook-ceph-external   /var/lib/rook     3          67s                              <span class="hljs-literal">true</span>       
<span class="hljs-comment"># Final</span>
manas@manas-s01:~$ microk8s kubectl --namespace rook-ceph-external get cephcluster
NAME                 DATADIRHOSTPATH   MONCOUNT   AGE    PHASE       MESSAGE                          HEALTH      EXTERNAL   FSID
rook-ceph-external   /var/lib/rook     3          6m1s   Connected   Cluster connected successfully   HEALTH_OK   <span class="hljs-literal">true</span>       e43d58a8-deb0-43d9-b7ef-d6159f114c02
ma
</code></pre>
<p>Wait until all k8s resources are in Running status:</p>
<pre><code class="lang-bash">manas@manas-s01:~$ microk8s kubectl get all --namespace rook-ceph
NAME                                                READY   STATUS    RESTARTS   AGE
pod/csi-cephfsplugin-chz2w                          2/2     Running   0          5m7s
pod/csi-cephfsplugin-n2wxd                          2/2     Running   0          5m7s
pod/csi-cephfsplugin-provisioner-7bd8fb7c64-fbmw6   5/5     Running   0          5m7s
pod/csi-cephfsplugin-provisioner-7bd8fb7c64-tbqlp   5/5     Running   0          5m7s
pod/csi-cephfsplugin-zpltk                          2/2     Running   0          5m7s
pod/csi-rbdplugin-7dgxz                             2/2     Running   0          5m7s
pod/csi-rbdplugin-8x5x7                             2/2     Running   0          5m7s
pod/csi-rbdplugin-provisioner-5f7d95b6fb-s4znf      5/5     Running   0          5m7s
pod/csi-rbdplugin-provisioner-5f7d95b6fb-wjk7q      5/5     Running   0          5m7s
pod/csi-rbdplugin-zccc7                             2/2     Running   0          5m7s
pod/rook-ceph-operator-684bbd569f-82dv9             1/1     Running   0          39m

NAME                              DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/csi-cephfsplugin   3         3         3       3            3           &lt;none&gt;          5m7s
daemonset.apps/csi-rbdplugin      3         3         3       3            3           &lt;none&gt;          5m8s

NAME                                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/csi-cephfsplugin-provisioner   2/2     2            2           5m7s
deployment.apps/csi-rbdplugin-provisioner      2/2     2            2           5m7s
deployment.apps/rook-ceph-operator             1/1     1            1           39m

NAME                                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/csi-cephfsplugin-provisioner-7bd8fb7c64   2         2         2       5m7s
replicaset.apps/csi-rbdplugin-provisioner-5f7d95b6fb      2         2         2       5m7s
replicaset.apps/rook-ceph-operator-684bbd569f             1         1         1       39m
manas@manas-s01:~$
</code></pre>
<p>Meanwhile, If you want to explore the possible API Resources, you the following command:</p>
<pre><code class="lang-bash">manas@manas-s01:~$ microk8s kubectl api-resources --namespace rook-ceph-external | grep ceph
cephblockpoolradosnamespaces                          ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephBlockPoolRadosNamespace
cephblockpools                                        ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephBlockPool
cephbucketnotifications                               ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephBucketNotification
cephbuckettopics                                      ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephBucketTopic
cephclients                                           ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephClient
cephclusters                                          ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephCluster
cephfilesystemmirrors                                 ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephFilesystemMirror
cephfilesystems                                       ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephFilesystem
cephfilesystemsubvolumegroups                         ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephFilesystemSubVolumeGroup
cephnfses                           nfs               ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephNFS
cephobjectrealms                                      ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephObjectRealm
cephobjectstores                                      ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephObjectStore
cephobjectstoreusers                rcou,objectuser   ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephObjectStoreUser
cephobjectzonegroups                                  ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephObjectZoneGroup
cephobjectzones                                       ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephObjectZone
cephrbdmirrors                                        ceph.rook.io/v1                   <span class="hljs-literal">true</span>         CephRBDMirror
</code></pre>
<p>To see Block storage status, you can use <code>cephblockpools</code>. Similarly, <code>cephobjectstores</code> is for Object and <code>cephnfses</code> is for NFS.</p>
<p>Refer to examples from <a target="_blank" href="https://github.com/rook/rook/tree/master/deploy/examples">https://github.com/rook/rook/tree/master/deploy/examples</a></p>
<h2 id="heading-create-block-storage">Create Block Storage</h2>
<pre><code class="lang-bash">$ microk8s kubectl create -f storageClass.yaml 
cephblockpool.ceph.rook.io/replicapool created
storageclass.storage.k8s.io/rook-ceph-block created

$ microk8s kubectl -n rook-ceph get cephblockpools
NAME          PHASE
replicapool   Progressing

$ microk8s kubectl create -f mysql.yaml 
service/wordpress-mysql created
persistentvolumeclaim/mysql-pv-claim created
deployment.apps/wordpress-mysql created
</code></pre>
<h2 id="heading-create-object-stores">Create Object Stores</h2>
<p>This requires <code>rgw</code> that we have already enabled</p>
<pre><code class="lang-bash">$ kubectl create -f object.yaml

$ microk8s kubectl -n rook-ceph get cephobjectstores
NAME       PHASE
my-store   Progressing
</code></pre>
<h2 id="heading-create-share-file-systems">Create Share File Systems</h2>
<pre><code class="lang-bash">$ kubectl create -f file.yaml
$ microk8s kubectl get cephfilesystems -n rook-ceph
NAME   ACTIVEMDS   AGE     PHASE
myfs   1           5m38s   Progressing
</code></pre>
<p>That’s it! We have completed deploying a cloud native storage solution, that too, locally.</p>
]]></content:encoded></item><item><title><![CDATA[MicroCeph is the easy way to Ceph]]></title><description><![CDATA[Introduction
This is part of the Distributed Storage series. We assume you have a Kubernetes cluster running as described in Kubernetes With Microk8s. Using the same virtual machines, we will deploy a Ceph cluster and enable different types of storag...]]></description><link>https://code.manas.me/microceph-is-the-easy-way-to-ceph</link><guid isPermaLink="true">https://code.manas.me/microceph-is-the-easy-way-to-ceph</guid><category><![CDATA[microceph]]></category><category><![CDATA[ceph]]></category><category><![CDATA[Ubuntu]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[storage]]></category><category><![CDATA[Open Source]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Sat, 21 Jun 2025 05:34:35 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750430014095/16618f2f-ac6d-4089-89d6-233b28f3b338.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>This is part of the <a target="_blank" href="https://code.manas.me/series/distributed-storage">Distributed Storage</a> series. We assume you have a Kubernetes cluster running as described in <a target="_blank" href="https://code.manas.me/kubernetes-with-microk8s">Kubernetes With Microk8s</a>. Using the same virtual machines, we will deploy a Ceph cluster and enable different types of storage: file, object, and block (default).</p>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>It is worth repeating that each VM has a disk which can be consumed by Ceph:</p>
<pre><code class="lang-bash">$ lsblk | grep -v loop
NAME                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sr0                        11:0    1   2.7G  0 rom  
nvme0n1                   259:0    0    80G  0 disk 
├─nvme0n1p1               259:1    0     1G  0 part /boot/efi
├─nvme0n1p2               259:2    0     2G  0 part /boot
└─nvme0n1p3               259:3    0  76.9G  0 part 
  └─ubuntu--vg-ubuntu--lv 252:0    0  38.5G  0 lvm  /
nvme0n2                   259:4    0    20G  0 disk
</code></pre>
<h1 id="heading-bootstrap-ceph-cluster">Bootstrap Ceph Cluster</h1>
<p>Let us install microceph.</p>
<p>You can purge any existing installation, or restart installation in case something goes wrong later.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># This will remove any existing installaion</span>
sudo snap remove microceph --purge
</code></pre>
<p>The following bootstraps a new cluster:</p>
<pre><code class="lang-bash">manas@manas-s01:~$ sudo snap install microceph --channel=stable
microceph (squid/stable) 19.2.0+snap3b53da1c21 from Canonical✓ installed
manas@manas-s01:~$ sudo microceph cluster bootstrap
manas@manas-s01:~$ sudo microceph status
MicroCeph deployment summary:
- manas-s01 (192.168.148.134)
  Services: mds, mgr, mon
  Disks: 0
</code></pre>
<p>We now have a single-node cluster. Adding other nodes is a bit different from microk8s. The cluster add command generates a unique token that includes the hostname. If the cluster join fails, ensure you have used the correct hostname. For example, the VMs in this demo have hostname manas-s01, manas-s02 and mans-s03. Naming things is hard.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Generate token to add nodes</span>
$ sudo microceph cluster add &lt;hostname&gt;
&lt;token&gt;
<span class="hljs-comment"># For each node </span>
$ sudo microceph cluster join &lt;token&gt;
<span class="hljs-comment"># Master Node</span>
manas@manas-s01:~$ sudo microceph status
MicroCeph deployment summary:
- manas-s01 (192.168.148.135)
  Services: mds, mgr, mon
  Disks: 0
- manas-s02 (192.168.148.136)
  Services: mds, mgr, mon
  Disks: 0
- manas-s03 (192.168.148.137)
  Services: mds, mgr, mon
  Disks: 0
</code></pre>
<p>Next, we will claim storage to be consumed by the cluster. Remember, we kept an extra disk for this purpose. Repeat the following for each node:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Repeat for each node </span>
manas@manas-s01:~$ sudo microceph disk add /dev/nvme0n2 --wipe

+--------------+---------+
|     PATH     | STATUS  |
+--------------+---------+
| /dev/nvme0n2 | Success |
+--------------+---------+

<span class="hljs-comment"># Status shows that we have 3 OSDs</span>
manas@manas-s01:~$ sudo microceph status 
MicroCeph deployment summary:
- manas-s01 (192.168.148.134)
  Services: mds, mgr, mon, osd
  Disks: 1
- manas-s02 (192.168.148.136)
  Services: mds, mgr, mon, osd
  Disks: 1
- manas-s03 (192.168.148.135)
  Services: mds, mgr, mon, osd
  Disks: 1
</code></pre>
<p>Check the cluster and disk status:</p>
<pre><code class="lang-bash">manas@manas-s01:~$ sudo microceph.ceph status 
  cluster:
    id:     e43d58a8-deb0-43d9-b7ef-d6159f114c02
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum manas-s01,manas-s02,manas-s03 (age 5m)
    mgr: manas-s01(active, since 52m), standbys: manas-s02, manas-s03
    osd: 3 osds: 3 up (since 101s), 3 <span class="hljs-keyword">in</span> (since 108s)

  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 577 KiB
    usage:   81 MiB used, 60 GiB / 60 GiB avail
    pgs:     1 active+clean

manas@manas-s01:~$ sudo microceph disk list
Disks configured <span class="hljs-keyword">in</span> MicroCeph:
+-----+-----------+-----------------------------------------------------------+
| OSD | LOCATION  |                           PATH                            |
+-----+-----------+-----------------------------------------------------------+
| 1   | manas-s01 | /dev/disk/by-id/nvme-eui.3e9ca0d1c76f942a000c296b819ff947 |
+-----+-----------+-----------------------------------------------------------+
| 2   | manas-s02 | /dev/disk/by-id/nvme-eui.3e9ca0d1c76f942a000c296b819ff947 |
+-----+-----------+-----------------------------------------------------------+
| 3   | manas-s03 | /dev/disk/by-id/nvme-eui.3e9ca0d1c76f942a000c296b819ff947 |
+-----+-----------+-----------------------------------------------------------+
</code></pre>
<p>Now, to consume File and Object storage, we need to enable the relevant services.</p>
<p>For example, we enable <code>rgw</code> for object storage on all the nodes</p>
<pre><code class="lang-bash">$ sudo microceph <span class="hljs-built_in">enable</span> rgw
$ sudo microceph status 
MicroCeph deployment summary:
- manas-s01 (192.168.148.134)
  Services: mds, mgr, mon, rgw, osd
  Disks: 1
- manas-s02 (192.168.148.136)
  Services: mds, mgr, mon, rgw, osd
  Disks: 1
- manas-s03 (192.168.148.135)
  Services: mds, mgr, mon, rgw, osd
  Disks: 1
</code></pre>
<p>Congratulations, your multi node Ceph cluster is ready!</p>
<p>Note that <code>microceph</code> commands are different from the <code>ceph</code> CLI</p>
<p>Reference docs for the Squid Release of Ceph:</p>
<ul>
<li><p><a target="_blank" href="https://canonical-microceph.readthedocs-hosted.com/en/v19.2.0-squid/reference/">https://canonical-microceph.readthedocs-hosted.com/en/v19.2.0-squid/reference/</a></p>
</li>
<li><p><a target="_blank" href="https://docs.ceph.com/en/squid/man/8/ceph/">https://docs.ceph.com/en/squid/man/8/ceph/</a></p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750564858704/97cfee82-6d79-40ad-9799-2900d937a6fd.jpeg" alt class="image--center mx-auto" /></p>
]]></content:encoded></item><item><title><![CDATA[Kubernetes with microk8s]]></title><description><![CDATA[Introduction
There are many ways to deploy Kubernetes (k8s) clusters locally. It's similar to starting with Linux; first, you choose a distribution, then explore the different versions, packages, and customizations.
You can begin with a desktop app l...]]></description><link>https://code.manas.me/kubernetes-with-microk8s</link><guid isPermaLink="true">https://code.manas.me/kubernetes-with-microk8s</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[microk8s]]></category><category><![CDATA[vmware]]></category><category><![CDATA[Ubuntu]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Sat, 21 Jun 2025 05:32:24 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750429989273/441381c9-6f57-464a-8cb2-4f460e2e9b2f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>There are many ways to deploy Kubernetes (k8s) clusters locally. It's similar to starting with Linux; first, you choose a distribution, then explore the different versions, packages, and customizations.</p>
<p>You can begin with a desktop app like Docker or Podman, which integrate well with k8s. Most IDEs have built-in support or plugins for working with k8s.</p>
<p>Once you move beyond the basics, you may face some limitations and inconveniences. If you're running the cluster on a laptop, you'll need reliable pause and resume features. It can be frustrating if you resume the cluster and the services are not healthy. You can always reset or reinstall, so choose your distributions carefully.</p>
<p>For this series, I'm sticking with Canonical solutions because Microk8s and MicroCeph have worked well. Integrating these with other tools wasn't straightforward. I'm a learner, not an expert in this area.</p>
<h1 id="heading-preprare-infrastructure">Preprare Infrastructure</h1>
<p>The goal is to use cloud-native storage. This requires <a target="_blank" href="https://rook.io/docs/rook/latest-release/Getting-Started/Prerequisites/prerequisites/">raw disks</a> that will be used by Ceph. So, instead of using containers, we use VMs as nodes to create the cluster. To enable high availability, we will deploy a multi node cluster. We will start with preparing VMs to create the multi node k8s cluster.</p>
<h2 id="heading-create-ubuntu-vms"><strong>Create Ubuntu VMs</strong></h2>
<p>To deploy our cluster, we need to create multiple virtual machines. For these example, we will use Ubuntu VMs running on VMware Fusion:</p>
<h2 id="heading-prerequisites"><strong>Prerequisites</strong></h2>
<ul>
<li><p>A desktop hypervisor like VMware Fusion or Workstation. Steps below are using Fusion on a Macbook.</p>
</li>
<li><p>Ubuntu Server LTS ISO. The example uses MacBook M1, so images are arm based.</p>
</li>
<li><p>At least 50GB free disk space per VM. The disk are thin provisioned but sufficient disk space is recommended.</p>
</li>
<li><p>Basic experience with OS installation. We are using Ubuntu server version, which does not have a desktop environment.</p>
</li>
</ul>
<h2 id="heading-creating-virtual-machines">Creating Virtual Machines</h2>
<ul>
<li><p>Launch VMware Fusion and click "+" to create a new virtual machine</p>
</li>
<li><p>Drag and drop the Ubuntu Server ISO or click "Create a custom virtual machine"</p>
</li>
<li><p>Select "Linux" and "Ubuntu 64-bit" as the operating system</p>
</li>
<li><p>Configure VM Resources for each node:</p>
<ul>
<li><p>CPUs: 4 cores minimum</p>
</li>
<li><p>Memory: 8GB minimum</p>
</li>
<li><p>Storage:</p>
<ul>
<li><p>Primary disk 50GB minimum</p>
</li>
<li><p>Additional 20 GB disk to be consumed by <strong>Ceph</strong> (covered in another post)</p>
</li>
</ul>
</li>
<li><p>Network: Bridge or NAT networking</p>
</li>
</ul>
</li>
<li><p>Complete the Ubuntu installation process:</p>
<ul>
<li><p>Choose "Install Ubuntu Server"</p>
</li>
<li><p>Select language and keyboard layout</p>
</li>
<li><p>Configure network settings (preferably static IP)</p>
</li>
<li><p>Set up username and password</p>
</li>
<li><p>Install OpenSSH server when prompted</p>
</li>
</ul>
</li>
</ul>
<h2 id="heading-post-installation-setup">Post-Installation Setup</h2>
<p>If you plan to enable Ceph on the cluster, ensure you have additional disks:</p>
<pre><code class="lang-bash">manas@manas-s01:~$ lsblk | grep -v loop
NAME                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sr0                        11:0    1   2.7G  0 rom  
nvme0n1                   259:0    0    80G  0 disk 
├─nvme0n1p1               259:1    0     1G  0 part /boot/efi
├─nvme0n1p2               259:2    0     2G  0 part /boot
└─nvme0n1p3               259:3    0  76.9G  0 part 
  └─ubuntu--vg-ubuntu--lv 252:0    0  38.5G  0 lvm  /
nvme0n2                   259:4    0    20G  0 disk
</code></pre>
<p>As we are using Ubuntu Server, only remote console or SSH is available.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Update the system</span>
sudo apt update &amp;&amp; sudo apt upgrade -y

<span class="hljs-comment"># Optional package to support copy/paste etc.</span>
sudo apt install -y open-vm-tools
</code></pre>
<p>Optional: Assign a static IP to the VM. You can use a netplan config like below:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">network:</span>
  <span class="hljs-attr">version:</span> <span class="hljs-number">2</span>
  <span class="hljs-attr">renderer:</span> <span class="hljs-string">networkd</span>
  <span class="hljs-attr">ethernets:</span>
    <span class="hljs-attr">ens160:</span>
      <span class="hljs-attr">dhcp4:</span> <span class="hljs-literal">no</span>
      <span class="hljs-attr">addresses:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">&lt;ip&gt;/24</span>
      <span class="hljs-attr">routes:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">to:</span> <span class="hljs-string">default</span>
          <span class="hljs-attr">via:</span> <span class="hljs-string">&lt;gateway&gt;</span>
      <span class="hljs-attr">nameservers:</span>
          <span class="hljs-attr">addresses:</span> [<span class="hljs-number">1.1</span><span class="hljs-number">.1</span><span class="hljs-number">.1</span>,<span class="hljs-number">8.8</span><span class="hljs-number">.8</span><span class="hljs-number">.8</span>]
</code></pre>
<p>Refer to <a target="_blank" href="https://documentation.ubuntu.com/server/explanation/networking/configuring-networks/#static-ip-address-assignment">Ubuntu network configuration docs</a> for details</p>
<h2 id="heading-clone-vms">Clone VMs</h2>
<p>We need to create at least three VMs for the cluster. There are ways to automate this step with packer, ansible, etc. However, we can create Linked Clone which is a cool feature:</p>
<blockquote>
<p>A linked clone is a VMware virtual machine that shares the virtual disk of the source virtual machine.</p>
</blockquote>
<ol>
<li><p>First shutdown the master VM,</p>
</li>
<li><p>Then from the VM Library, right click and create a Linked Clone</p>
</li>
<li><p>Set a unique hostname and IP and reboot</p>
</li>
<li><p>Repeat the steps 2 and 3 for the third VM</p>
</li>
</ol>
<p>Once the VMs are up, make sure to give each VM a unique hostname and IP address:</p>
<pre><code class="lang-bash">hostnamectl <span class="hljs-built_in">set</span> &lt;hostname&gt;
<span class="hljs-comment"># Reboot if required. Ensure each VM has unique name and IP.</span>
<span class="hljs-comment"># Change the netplan config: </span>
<span class="hljs-comment"># https://documentation.ubuntu.com/server/explanation/networking/configuring-networks</span>
sudo netplan apply
<span class="hljs-comment"># Reboot</span>
sudo reboot
</code></pre>
<p>Now, to be able to run commands, it is better to setup password less SSH</p>
<pre><code class="lang-bash">$ ssh-copy-id &lt;username&gt;@&lt;hostame&gt; 
<span class="hljs-comment"># Enter the passphrase and password when prompted</span>
<span class="hljs-comment"># Repeat for all the 3 VMs</span>
</code></pre>
<h1 id="heading-initialize-the-k8s-cluster">Initialize the k8s cluster</h1>
<p>Start mircok8s on the master node and then run <code>add-node</code> from the other nodes:</p>
<blockquote>
<p>In some cases, k8s may fail to start with a missing file error: <a target="_blank" href="https://github.com/canonical/microk8s/issues/4361">microk8s/issues/4361</a></p>
<p>As a workaround, create the missing file and restart k8s:</p>
</blockquote>
<pre><code class="lang-bash"><span class="hljs-comment"># Install</span>
sudo snap install microk8s --classic --channel=1.32
<span class="hljs-comment"># Setup User permissions</span>
sudo usermod -a -G microk8s <span class="hljs-variable">$USER</span> &amp;&amp; \
mkdir -p ~/.kube &amp;&amp; \
chmod 0700 ~/.kube
<span class="hljs-comment"># Check status (start if required)</span>
microk8s status
<span class="hljs-comment"># In case of failure to start, use inspect</span>
microk8s inspect
<span class="hljs-comment"># Workaround to missing file error</span>
sudo touch /var/snap/microk8s/8147/var/kubernetes/backend/localnode.yaml
</code></pre>
<p>Check the node status from <code>kubectl</code></p>
<pre><code class="lang-bash">manas@manas-s01:~$ microk8s status
microk8s is running
high-availability: no
  datastore master nodes: 127.0.0.1:19001
  datastore standby nodes: none
addons:
  enabled:
    dns                  <span class="hljs-comment"># (core) CoreDNS</span>
    ha-cluster           <span class="hljs-comment"># (core) Configure high availability on the current node</span>
    helm                 <span class="hljs-comment"># (core) Helm - the package manager for Kubernetes</span>
    helm3                <span class="hljs-comment"># (core) Helm 3 - the package manager for Kubernetes</span>
  disabled:
    cert-manager         <span class="hljs-comment"># (core) Cloud native certificate management</span>
    cis-hardening        <span class="hljs-comment"># (core) Apply CIS K8s hardening</span>
    community            <span class="hljs-comment"># (core) The community addons repository</span>
    dashboard            <span class="hljs-comment"># (core) The Kubernetes dashboard</span>
    host-access          <span class="hljs-comment"># (core) Allow Pods connecting to Host services smoothly</span>
    hostpath-storage     <span class="hljs-comment"># (core) Storage class; allocates storage from host directory</span>
    ingress              <span class="hljs-comment"># (core) Ingress controller for external access</span>
    kube-ovn             <span class="hljs-comment"># (core) An advanced network fabric for Kubernetes</span>
    mayastor             <span class="hljs-comment"># (core) OpenEBS MayaStor</span>
    metallb              <span class="hljs-comment"># (core) Loadbalancer for your Kubernetes cluster</span>
    metrics-server       <span class="hljs-comment"># (core) K8s Metrics Server for API access to service metrics</span>
    minio                <span class="hljs-comment"># (core) MinIO object storage</span>
    observability        <span class="hljs-comment"># (core) A lightweight observability stack for logs, traces and metrics</span>
    prometheus           <span class="hljs-comment"># (core) Prometheus operator for monitoring and logging</span>
    rbac                 <span class="hljs-comment"># (core) Role-Based Access Control for authorisation</span>
    registry             <span class="hljs-comment"># (core) Private image registry exposed on localhost:32000</span>
    rook-ceph            <span class="hljs-comment"># (core) Distributed Ceph storage using Rook</span>
    storage              <span class="hljs-comment"># (core) Alias to hostpath-storage add-on, deprecated</span>

manas@manas-s01:~$ microk8s kubectl get no
NAME        STATUS   ROLES    AGE   VERSION
manas-s01   Ready    &lt;none&gt;   22m   v1.32.3
</code></pre>
<p>Add other nodes to the cluster</p>
<pre><code class="lang-bash">manas@manas-s01:~$ microk8s add-node
From the node you wish to join to this cluster, run the following:
microk8s join 192.168.148.134:25000/...

Use the <span class="hljs-string">'--worker'</span> flag to join a node as a worker not running the control plane, eg:
microk8s join 192.168.148.134:25000/... --worker

If the node you are adding is not reachable through the default interface you can use one of the following:
microk8s join 192.168.148.134:25000/.../...
microk8s join 172.17.0.1:25000/.../...
</code></pre>
<p>From the nodes, join the cluster by pasting the command from above.</p>
<pre><code class="lang-bash">manas@manas-s02:~$ microk8s join 192.168.148.134:25000/.../...
Contacting cluster at 192.168.148.134
Waiting <span class="hljs-keyword">for</span> this node to finish joining the cluster. .. .. .. .. .. .. .. .. .. ..  
Successfully joined the cluster.
</code></pre>
<p>From the Master Node, ensure k8s is up and running. We should have HA enabled with 3 nodes.</p>
<pre><code class="lang-bash">manas@manas-s01:~$ microk8s status
microk8s is running
high-availability: yes
  datastore master nodes: 192.168.148.134:19001 192.168.148.136:19001 192.168.148.135:19001
  datastore standby nodes: none
addons:
  enabled:
    dns                  <span class="hljs-comment"># (core) CoreDNS</span>
    ha-cluster           <span class="hljs-comment"># (core) Configure high availability on the current node</span>
    helm                 <span class="hljs-comment"># (core) Helm - the package manager for Kubernetes</span>
    helm3                <span class="hljs-comment"># (core) Helm 3 - the package manager for Kubernetes</span>
  disabled:
    cert-manager         <span class="hljs-comment"># (core) Cloud native certificate management</span>
    cis-hardening        <span class="hljs-comment"># (core) Apply CIS K8s hardening</span>
    community            <span class="hljs-comment"># (core) The community addons repository</span>
    dashboard            <span class="hljs-comment"># (core) The Kubernetes dashboard</span>
    host-access          <span class="hljs-comment"># (core) Allow Pods connecting to Host services smoothly</span>
    hostpath-storage     <span class="hljs-comment"># (core) Storage class; allocates storage from host directory</span>
    ingress              <span class="hljs-comment"># (core) Ingress controller for external access</span>
    kube-ovn             <span class="hljs-comment"># (core) An advanced network fabric for Kubernetes</span>
    mayastor             <span class="hljs-comment"># (core) OpenEBS MayaStor</span>
    metallb              <span class="hljs-comment"># (core) Loadbalancer for your Kubernetes cluster</span>
    metrics-server       <span class="hljs-comment"># (core) K8s Metrics Server for API access to service metrics</span>
    minio                <span class="hljs-comment"># (core) MinIO object storage</span>
    observability        <span class="hljs-comment"># (core) A lightweight observability stack for logs, traces and metrics</span>
    prometheus           <span class="hljs-comment"># (core) Prometheus operator for monitoring and logging</span>
    rbac                 <span class="hljs-comment"># (core) Role-Based Access Control for authorisation</span>
    registry             <span class="hljs-comment"># (core) Private image registry exposed on localhost:32000</span>
    rook-ceph            <span class="hljs-comment"># (core) Distributed Ceph storage using Rook</span>
    storage              <span class="hljs-comment"># (core) Alias to hostpath-storage add-on, deprecated</span>
</code></pre>
<p>Once all nodes have joined, we can see the following output:</p>
<pre><code class="lang-bash">manas@manas-s01:~$ sudo microk8s kubectl get no
NAME        STATUS   ROLES    AGE     VERSION
manas-s01   Ready    &lt;none&gt;   48m     v1.32.3
manas-s02   Ready    &lt;none&gt;   4m39s   v1.32.3
manas-s03   Ready    &lt;none&gt;   4m26s   v1.32.3
</code></pre>
<p>You can see all the resources that are running</p>
<pre><code class="lang-bash">manas@manas-s01:~$ microk8s kubectl get all --all-namespaces
NAMESPACE     NAME                                           READY   STATUS    RESTARTS   AGE
kube-system   pod/calico-kube-controllers-5947598c79-z6wcb   1/1     Running   0          51m
kube-system   pod/calico-node-f2jv7                          1/1     Running   0          29m
kube-system   pod/calico-node-jdmvj                          1/1     Running   0          7m56s
kube-system   pod/calico-node-lfbqs                          1/1     Running   0          7m43s
kube-system   pod/coredns-79b94494c7-k98hm                   1/1     Running   0          51m

NAMESPACE     NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes   ClusterIP   10.152.183.1    &lt;none&gt;        443/TCP                  51m
kube-system   service/kube-dns     ClusterIP   10.152.183.10   &lt;none&gt;        53/UDP,53/TCP,9153/TCP   51m

NAMESPACE     NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system   daemonset.apps/calico-node   3         3         3       3            3           kubernetes.io/os=linux   51m

NAMESPACE     NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/calico-kube-controllers   1/1     1            1           51m
kube-system   deployment.apps/coredns                   1/1     1            1           51m

NAMESPACE     NAME                                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/calico-kube-controllers-5947598c79   1         1         1       51m
kube-system   replicaset.apps/coredns-79b94494c7                   1         1         1       51m
</code></pre>
<p>Congratulations you have a multi node k8s running!</p>
<p>Please note that <code>microk8s</code> commands are different from other k8s CLIs. As this is a compliant k8s cluster, other tools should also work. The installation is snap based, so config and log files are under the snap directory. Refer to <a target="_blank" href="https://microk8s.io/docs/command-reference">https://microk8s.io/docs/command-reference</a></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750564751374/3bec1896-542c-4053-9775-b0540388b4bc.jpeg" alt class="image--center mx-auto" /></p>
]]></content:encoded></item><item><title><![CDATA[Introduction to Distributed Storage Series]]></title><description><![CDATA[Understanding Distributed Storage
Distributed storage systems spread data across multiple nodes or machines, offering benefits like high availability, fault tolerance, and scalability. These systems ensure data remains accessible even if some nodes f...]]></description><link>https://code.manas.me/introduction-to-distributed-storage-series</link><guid isPermaLink="true">https://code.manas.me/introduction-to-distributed-storage-series</guid><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Fri, 20 Jun 2025 13:05:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/M5tzZtFCOfs/upload/b3188b9b687386b404d921250386267c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-understanding-distributed-storage">Understanding Distributed Storage</h2>
<p>Distributed storage systems spread data across multiple nodes or machines, offering benefits like high availability, fault tolerance, and scalability. These systems ensure data remains accessible even if some nodes fail. The implementation of these features varies across products.</p>
<h2 id="heading-key-components">Key Components</h2>
<ul>
<li><p><strong>Data Distribution:</strong> Information is split and stored across multiple nodes</p>
</li>
<li><p><strong>Replication:</strong> Data is copied to multiple locations for redundancy</p>
</li>
<li><p><strong>Consistency:</strong> Mechanisms to ensure data remains synchronized across nodes</p>
</li>
<li><p><strong>Load Balancing:</strong> Even distribution of storage and access load across nodes</p>
</li>
</ul>
<h2 id="heading-kubernetes-storage-architecture">Kubernetes Storage Architecture</h2>
<p>Kubernetes provides a robust framework for container orchestration, including storage management through:</p>
<ul>
<li><p><strong>Persistent Volumes (PV):</strong> Storage resources in the cluster</p>
</li>
<li><p><strong>Persistent Volume Claims (PVC):</strong> Storage requests by applications</p>
</li>
<li><p><strong>Storage Classes:</strong> Different types of storage with varying performance characteristics</p>
</li>
</ul>
<h2 id="heading-ceph-a-distributed-storage-solution">Ceph: A Distributed Storage Solution</h2>
<p>Ceph is a highly scalable distributed storage system that provides:</p>
<ul>
<li><p><strong>Object Storage:</strong> Through RADOS Gateway (RGW)</p>
</li>
<li><p><strong>Block Storage:</strong> Through RADOS Block Device (RBD)</p>
</li>
<li><p><strong>File Storage:</strong> Through CephFS</p>
</li>
</ul>
<p>Ceph achieves high reliability through data replication and self-healing capabilities.</p>
<h2 id="heading-rook-bridging-kubernetes-and-ceph">Rook: Bridging Kubernetes and Ceph</h2>
<p>Rook acts as a storage orchestrator that integrates Ceph with Kubernetes:</p>
<ul>
<li><p><strong>Automated Management:</strong> Handles deployment, configuration, and scaling of Ceph clusters</p>
</li>
<li><p><strong>Native Integration:</strong> Provides storage services directly to Kubernetes applications</p>
</li>
<li><p><strong>Operator Pattern:</strong> Uses Kubernetes operators for automated management and maintenance</p>
</li>
<li><p><strong>Storage Classes:</strong> Creates and manages Kubernetes storage classes for Ceph storage</p>
</li>
</ul>
<h2 id="heading-benefits-of-the-combined-stack">Benefits of the Combined Stack</h2>
<p>Using Kubernetes with Ceph through Rook provides:</p>
<ul>
<li><p><strong>Cloud-Native Storage:</strong> Fully containerized storage solution</p>
</li>
<li><p><strong>Dynamic Provisioning:</strong> Automatic storage allocation based on application needs</p>
</li>
<li><p><strong>High Availability:</strong> Resilient storage infrastructure with automated failover</p>
</li>
<li><p><strong>Scalability:</strong> Easy scaling of both compute and storage resources</p>
</li>
</ul>
<p>In this series, we will deploy a distributed storage cluster locally. We'll start by creating VMs for multi-node clusters, then set up multi-node Kubernetes and Ceph clusters. Finally, we'll integrate Kubernetes and Ceph using Rook. Here’s a birds’ eye view:</p>
<pre><code class="lang-mermaid">---
config:
  theme: neutral
  layout: dagre
  look: neo
---
flowchart TB
 subgraph subGraph0["Kubernetes Cluster"]
        CP["Control Plane"]
        WN["Worker Nodes"]
        APP["Applications"]
  end
 subgraph subGraph1["Rook Storage Operator"]
        ROOK["Rook Operator"]
  end
 subgraph subGraph2["Storage Services"]
        RBD["Block Storage&lt;br&gt;RBD"]
        CEPHFS["File System&lt;br&gt;CephFS"]
        RGW["Object Storage&lt;br&gt;S3/Swift"]
  end
 subgraph subGraph3["Ceph Storage Cluster"]
        CEPH["Ceph Cluster"]
        subGraph2
  end
 subgraph subGraph4["Physical Infrastructure"]
        DISKS["Physical Disks"]
  end
    CP --&gt; WN
    WN --&gt; APP &amp; ROOK
    ROOK --&gt; CEPH
    CEPH --&gt; RBD &amp; CEPHFS &amp; RGW &amp; DISKS
    RBD --&gt; APP
    CEPHFS --&gt; APP
    RGW --&gt; APP
     CP:::k8s
     WN:::k8s
     APP:::k8s
     ROOK:::rook
     RBD:::ceph
     CEPHFS:::ceph
     RGW:::ceph
     CEPH:::ceph
     DISKS:::infra
    classDef k8s fill:#e1f5fe
    classDef rook fill:#fce4ec
    classDef ceph fill:#fff3e0
    classDef storage fill:#e8f5e8
    classDef infra fill:#f5f5f5
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Supercharging LLMs: A Guide to Tool Use with the Cerebras SDK]]></title><description><![CDATA[Introduction
Large Language Models (LLMs) are incredibly powerful, but their knowledge is often limited to the data they were trained on. What if you could give them access to real-time information or allow them to interact with other systems? That's...]]></description><link>https://code.manas.me/supercharging-llms-a-guide-to-tool-use-with-the-cerebras-sdk</link><guid isPermaLink="true">https://code.manas.me/supercharging-llms-a-guide-to-tool-use-with-the-cerebras-sdk</guid><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Tue, 17 Jun 2025 03:07:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/xuTJZ7uD7PI/upload/73bf94b1733617e45e47f9e71100e1c8.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>Large Language Models (LLMs) are incredibly powerful, but their knowledge is often limited to the data they were trained on. What if you could give them access to real-time information or allow them to interact with other systems? That's where "tools" come in. The Cerebras SDK makes it straightforward to equip your LLM applications with the ability to use custom functions, effectively extending their capabilities.</p>
<p>This improves upon the <a target="_blank" href="https://inference-docs.cerebras.ai/capabilities/tool-use">official guide</a> by using a working demo API and fixing some code bugs. Thanks to the generous free API by Cerebras which lets you try out the (current) latest Qwen 3 model.</p>
<p>In this post, we'll walk through a practical example: building a financial assistant that can calculate the Simple Moving Average (SMA) for stock data. We'll use the Cerebras SDK to enable an LLM to call our custom Python function for this calculation.</p>
<h1 id="heading-a-stock-savvy-assistant-version-1">A Stock-Savvy Assistant Version 1</h1>
<p>Our objective is to create a system where a user can ask a question like, "What's the 10-day moving average for company A over the last 50 days?" and the LLM, instead of just guessing, can use a specific tool (our Python function) to get the precise answer.</p>
<h2 id="heading-step-1-gathering-the-data">Step 1: Gathering the Data</h2>
<p>First, we need a way to get stock data. Our demo uses the Alpha Vantage API. While the original <a target="_blank" href="https://inference-docs.cerebras.ai/capabilities/tool-use">guide</a> uses mock data, here we implement <code>get_stocks_data</code> function to fetch daily time series data for a given stock symbol using Alpha Vantage APIs. We need to convert the <code>dict</code> to a <code>list</code> with date as a key in each item.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> requests

<span class="hljs-comment"># URLs for fetching stock data from Alpha Vantage API</span>
urls = [
    <span class="hljs-string">f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&amp;symbol=APPL&amp;outputsize=full&amp;apikey=<span class="hljs-subst">{api_key}</span>"</span>,
    <span class="hljs-string">f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&amp;symbol=GOOG&amp;outputsize=full&amp;apikey=<span class="hljs-subst">{api_key}</span>"</span>,
]

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_stocks_data</span>(<span class="hljs-params">url: str</span>) -&gt; list[dict]:</span>
    <span class="hljs-string">"""
    Fetches and processes daily time series stock data from a given Alpha Vantage URL.
    """</span>
    r = requests.get(url)
    r.raise_for_status() <span class="hljs-comment"># Good practice to check for request errors</span>
    data = r.json()
    <span class="hljs-comment"># Extract the 'Time Series (Daily)' data from the response</span>
    <span class="hljs-comment"># Add error handling for unexpected API response structure</span>
    time_series_data = data.get(<span class="hljs-string">"Time Series (Daily)"</span>)
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> time_series_data:
        <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">"Could not find 'Time Series (Daily)' in API response."</span>)

    stocks_data = []
    <span class="hljs-comment"># Reformat the data into a list of dictionaries</span>
    <span class="hljs-keyword">for</span> s_date, s_data <span class="hljs-keyword">in</span> time_series_data.items():
        n = {<span class="hljs-string">"date"</span>: s_date}
        n.update(s_data)
        stocks_data.append(n)
    <span class="hljs-keyword">return</span> stocks_data

<span class="hljs-comment"># Fetch stock data for two predefined companies</span>
company_a_data = get_stocks_data(urls[<span class="hljs-number">0</span>])
company_b_data = get_stocks_data(urls[<span class="hljs-number">1</span>])

<span class="hljs-comment"># Store the fetched company data in a dictionary for easy access</span>
available_data = {
    <span class="hljs-string">"company_a"</span>: company_a_data,
    <span class="hljs-string">"company_b"</span>: company_b_data,
}
</code></pre>
<p>This function fetches data for "IBM" and "RELIANCE.BSE" and stores it in available_data.</p>
<h2 id="heading-step-2-the-core-logic-calculating-moving-average">Step 2: The Core Logic - Calculating Moving Average</h2>
<p>Next, we define the function that will act as our "tool". The <code>calculate_moving_average</code> function takes a data reference (e.g., "company_a"), the number of days to consider, and the window size for the moving average</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">calculate_moving_average</span>(<span class="hljs-params">
        data_reference: str, num_days: int, window_size: int
</span>) -&gt; list[dict[str, float]]:</span>
    <span class="hljs-string">"""
    Calculates the moving average for a specified stock.
    """</span>
    <span class="hljs-keyword">if</span> data_reference <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> available_data:
        <span class="hljs-keyword">raise</span> ValueError(
            <span class="hljs-string">f"Invalid data reference. Available options: <span class="hljs-subst">{list(available_data.keys())}</span>"</span>
        )

    stock_data = available_data[data_reference]

    <span class="hljs-keyword">if</span> num_days &lt; window_size:
        <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">"num_days must be greater than or equal to window_size"</span>)

    <span class="hljs-keyword">if</span> len(stock_data) &lt; num_days:
        <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">"Insufficient data for the specified number of days"</span>)

    <span class="hljs-comment"># Data is typically sorted newest to oldest from Alpha Vantage.</span>
    <span class="hljs-comment"># For SMA, we usually want to process oldest to newest or ensure correct slicing.</span>
    <span class="hljs-comment"># Assuming data is sorted: stock_data[0] is newest, stock_data[-1] is oldest.</span>
    <span class="hljs-comment"># To get the most recent 'num_days', we might need to reverse or slice carefully.</span>
    <span class="hljs-comment"># For this example, let's assume the API returns data oldest to newest,</span>
    <span class="hljs-comment"># or that get_stocks_data sorts it that way. If not, this needs adjustment.</span>
    <span class="hljs-comment"># If data is newest to oldest:</span>
    <span class="hljs-comment"># recent_data_points = stock_data[:num_days] # Get the N most recent</span>
    <span class="hljs-comment"># recent_data_points.reverse() # Process oldest to newest for SMA calculation</span>

    <span class="hljs-comment"># Let's stick to the original logic of taking the last 'num_days' from the list</span>
    <span class="hljs-comment"># This implies the list is already sorted oldest to newest.</span>
    recent_data_points: list[dict] = stock_data[num_days:]
    moving_averages: list[dict[str, float]] = []

    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> recent_data_points <span class="hljs-keyword">or</span> len(recent_data_points) &lt; window_size:
        <span class="hljs-keyword">return</span> [] <span class="hljs-comment"># Not enough data to calculate even one SMA</span>

    <span class="hljs-comment"># Initialize the price window with the closing prices of the first 'window_size' days</span>
    price_window: list[float] = [
        float(item[<span class="hljs-string">"4. close"</span>]) <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> recent_data_points[:window_size]
    ]
    <span class="hljs-comment"># Add the first SMA if we have exactly window_size data points</span>
    <span class="hljs-keyword">if</span> len(price_window) == window_size:
         moving_averages.append({
             <span class="hljs-string">"date"</span>: recent_data_points[window_size <span class="hljs-number">-1</span>][<span class="hljs-string">"date"</span>], <span class="hljs-comment"># Date of the last day in the window</span>
             <span class="hljs-string">"movingAverage"</span>: round(sum(price_window) / window_size, <span class="hljs-number">2</span>)
         })

    <span class="hljs-comment"># Calculate moving average for the remaining days</span>
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(window_size, len(recent_data_points)):
        current_data = recent_data_points[i]
        current_price = float(current_data[<span class="hljs-string">"4. close"</span>])

        price_window.pop(<span class="hljs-number">0</span>) <span class="hljs-comment"># Remove the oldest price</span>
        price_window.append(current_price) <span class="hljs-comment"># Add the newest price</span>
        average = round(sum(price_window) / window_size, <span class="hljs-number">2</span>)

        moving_averages.append({<span class="hljs-string">"date"</span>: current_data[<span class="hljs-string">"date"</span>], <span class="hljs-string">"movingAverage"</span>: average})

    <span class="hljs-keyword">return</span> moving_averages
</code></pre>
<p>This function performs the actual SMA calculation. Note the input validation and the logic for sliding the window.</p>
<h2 id="heading-step-3-describing-the-tool-to-the-llm">Step 3: Describing the Tool to the LLM</h2>
<p>For the LLM to understand how to use our function, we need to provide a schema for its parameters. We'll use Pydantic which is excellent for this:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pydantic <span class="hljs-keyword">import</span> BaseModel, Field
<span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> Literal

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CalculateMovingAverageArgs</span>(<span class="hljs-params">BaseModel</span>):</span>
    <span class="hljs-string">"""
    Pydantic model defining the arguments for the calculate_moving_average function.
    Used for schema validation and generation for AI tool usage.
    """</span>
    data_reference: Literal[<span class="hljs-string">"company_a"</span>, <span class="hljs-string">"company_b"</span>] = Field(
        ...,
        description=<span class="hljs-string">"The key to access specific stock data in the stock_data dictionary."</span>,
    )
    num_days: int = Field(
        ..., description=<span class="hljs-string">"The number of recent days to consider for calculation."</span>
    )
    window_size: int = Field(..., description=<span class="hljs-string">"The size of the moving average window."</span>)
</code></pre>
<p>This CalculateMovingAverageArgs model clearly defines the expected inputs, their types, and descriptions.</p>
<h2 id="heading-step-4-registering-the-tool-with-cerebras-sdk">Step 4: Registering the Tool with Cerebras SDK</h2>
<p>Now, we tell the Cerebras SDK about our tool:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Define the tool (function) that the AI model can use</span>
tools = [
    {
        <span class="hljs-string">"type"</span>: <span class="hljs-string">"function"</span>,
        <span class="hljs-string">"function"</span>: {
            <span class="hljs-string">"name"</span>: <span class="hljs-string">"calculate_moving_average"</span>,
            <span class="hljs-string">"description"</span>: <span class="hljs-string">"Calculate the moving average of stock data for a specified number of recent days."</span>,
            <span class="hljs-string">"parameters"</span>: CalculateMovingAverageArgs.model_json_schema(),
        },
    }
]
</code></pre>
<p>We create a list of tool definitions. Each tool has a type ("function"), a name (matching our Python function name), a description (for the LLM to understand its purpose), and parameters (using the JSON schema from our Pydantic model).</p>
<h2 id="heading-step-5-making-the-api-call">Step 5: Making the API Call</h2>
<p>With everything set up, we can now interact with the LLM:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> os
<span class="hljs-keyword">from</span> cerebras.cloud.sdk <span class="hljs-keyword">import</span> Cerebras

<span class="hljs-comment"># Define the messages for the AI model, including system prompt and user query</span>
messages = [
    {
        <span class="hljs-string">"role"</span>: <span class="hljs-string">"system"</span>,
        <span class="hljs-string">"content"</span>: <span class="hljs-string">"You are a helpful financial analyst. Use the supplied tools to assist the user."</span>,
    },
    {
        <span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>,
        <span class="hljs-string">"content"</span>: <span class="hljs-string">"What's the 10-day moving average for company A over the last 50 days?"</span>,
    },
]

<span class="hljs-comment"># Initialize the Cerebras SDK client using API key from environment variable</span>
client = Cerebras(
    api_key=os.environ.get(<span class="hljs-string">"CEREBRAS_API_KEY"</span>),
)

<span class="hljs-comment"># Make a chat completion request to the Cerebras API</span>
response = client.chat.completions.create(
    model=<span class="hljs-string">"qwen-3-32b"</span>,  <span class="hljs-comment"># Specify the AI model to use</span>
    messages=messages,
    tools=tools,  <span class="hljs-comment"># Provide the available tools to the model</span>
)
</code></pre>
<p>We initialize the Cerebras client, define our conversation messages, and importantly, pass our tools definition to the create method.</p>
<h2 id="heading-step-6-handling-the-llms-response">Step 6: Handling the LLM's Response</h2>
<p>The LLM might respond directly, or it might decide to use our tool. We need to handle both cases</p>
<pre><code class="lang-python"><span class="hljs-comment"># Process the AI model's response</span>
content = response.choices[<span class="hljs-number">0</span>].message.content
<span class="hljs-keyword">if</span> content:
    print(<span class="hljs-string">"AI Response Content:"</span>)
    print(content)

<span class="hljs-comment"># Check if the AI decided to call a function</span>
<span class="hljs-keyword">if</span> response.choices[<span class="hljs-number">0</span>].message.tool_calls:
    tool_call = response.choices[<span class="hljs-number">0</span>].message.tool_calls[<span class="hljs-number">0</span>] <span class="hljs-comment"># Get the first tool call</span>
    function_call = tool_call.function
    <span class="hljs-keyword">if</span> function_call.name == <span class="hljs-string">"calculate_moving_average"</span>:
        <span class="hljs-comment"># Parse the arguments provided by the AI for the function call</span>
        <span class="hljs-keyword">try</span>:
            arguments = json.loads(function_call.arguments)
            <span class="hljs-comment"># Call the local function with the AI-provided arguments</span>
            result = calculate_moving_average(**arguments)
            print(<span class="hljs-string">"\\nResult of function call (calculate_moving_average):"</span>)
            print(result)

        <span class="hljs-keyword">except</span> json.JSONDecodeError:
            print(<span class="hljs-string">f"Error: Could not decode arguments: <span class="hljs-subst">{function_call.arguments}</span>"</span>)
        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            print(<span class="hljs-string">f"Error executing function <span class="hljs-subst">{function_call.name}</span>: <span class="hljs-subst">{e}</span>"</span>)
</code></pre>
<p>If <code>response.choices[0].message.tool_calls</code> is present, it means the LLM wants to use one of our tools. We extract the function name and arguments, then execute our local <code>calculate_moving_average</code> function.</p>
<p>The commented-out section shows how you could then send this result back to the LLM. This allows the LLM to formulate a natural language response based on the data retrieved by the tool, making the interaction more conversational.</p>
<p>Here's a sample output:</p>
<pre><code class="lang-bash">[{<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-06-04'</span>, <span class="hljs-string">'movingAverage'</span>: 200.0}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-06-03'</span>, <span class="hljs-string">'movingAverage'</span>: 200.76}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-06-02'</span>, <span class="hljs-string">'movingAverage'</span>: 201.09}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-30'</span>, <span class="hljs-string">'movingAverage'</span>: 201.53}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-29'</span>, <span class="hljs-string">'movingAverage'</span>: 201.6}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-28'</span>, <span class="hljs-string">'movingAverage'</span>: 201.77}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-27'</span>, <span class="hljs-string">'movingAverage'</span>: 201.52}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-23'</span>, <span class="hljs-string">'movingAverage'</span>: 200.9}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-22'</span>, <span class="hljs-string">'movingAverage'</span>: 200.65}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-21'</span>, <span class="hljs-string">'movingAverage'</span>: 200.79}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-20'</span>, <span class="hljs-string">'movingAverage'</span>: 201.2}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-19'</span>, <span class="hljs-string">'movingAverage'</span>: 201.75}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-16'</span>, <span class="hljs-string">'movingAverage'</span>: 202.7}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-15'</span>, <span class="hljs-string">'movingAverage'</span>: 203.77}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-14'</span>, <span class="hljs-string">'movingAverage'</span>: 205.0}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-13'</span>, <span class="hljs-string">'movingAverage'</span>: 206.25}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-12'</span>, <span class="hljs-string">'movingAverage'</span>: 207.31}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-09'</span>, <span class="hljs-string">'movingAverage'</span>: 207.64}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-08'</span>, <span class="hljs-string">'movingAverage'</span>: 207.25}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-07'</span>, <span class="hljs-string">'movingAverage'</span>: 206.67}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-06'</span>, <span class="hljs-string">'movingAverage'</span>: 205.83}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-05'</span>, <span class="hljs-string">'movingAverage'</span>: 204.84}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-02'</span>, <span class="hljs-string">'movingAverage'</span>: 204.25}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-05-01'</span>, <span class="hljs-string">'movingAverage'</span>: 204.44}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-30'</span>, <span class="hljs-string">'movingAverage'</span>: 204.46}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-29'</span>, <span class="hljs-string">'movingAverage'</span>: 204.28}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-28'</span>, <span class="hljs-string">'movingAverage'</span>: 204.22}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-25'</span>, <span class="hljs-string">'movingAverage'</span>: 205.29}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-24'</span>, <span class="hljs-string">'movingAverage'</span>: 206.38}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-23'</span>, <span class="hljs-string">'movingAverage'</span>: 207.22}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-22'</span>, <span class="hljs-string">'movingAverage'</span>: 207.34}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-21'</span>, <span class="hljs-string">'movingAverage'</span>: 206.77}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-17'</span>, <span class="hljs-string">'movingAverage'</span>: 205.93}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-16'</span>, <span class="hljs-string">'movingAverage'</span>: 204.03}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-15'</span>, <span class="hljs-string">'movingAverage'</span>: 202.99}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-14'</span>, <span class="hljs-string">'movingAverage'</span>: 202.12}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-11'</span>, <span class="hljs-string">'movingAverage'</span>: 200.92}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-10'</span>, <span class="hljs-string">'movingAverage'</span>: 199.03}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-09'</span>, <span class="hljs-string">'movingAverage'</span>: 198.08}, {<span class="hljs-string">'date'</span>: <span class="hljs-string">'2025-04-08'</span>, <span class="hljs-string">'movingAverage'</span>: 194.87}]

Process finished with <span class="hljs-built_in">exit</span> code 0
</code></pre>
<p>What happens if there is no suitable tool is available? Let us modify the message and try again to see the result:</p>
<pre><code class="lang-bash">messages = [
    {
        <span class="hljs-string">"role"</span>: <span class="hljs-string">"system"</span>,
        <span class="hljs-string">"content"</span>: <span class="hljs-string">"You are a helpful financial analyst. Use the supplied tools to assist the user."</span>,
    },
    {
        <span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>,
        <span class="hljs-string">"content"</span>: <span class="hljs-string">"Give an overview of the company A"</span>,
    },
]

<span class="hljs-comment"># Make a chat completion request to the Cerebras API</span>
response = client.chat.completions.create(
    model=<span class="hljs-string">"qwen-3-32b"</span>,  
    messages=messages,
    tools=tools, 
)
<span class="hljs-comment"># Process the AI model's response</span>
content = response.choices[0].message.content
<span class="hljs-keyword">if</span> content:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">"AI Response Content:"</span>)
    <span class="hljs-built_in">print</span>(content)
</code></pre>
<p>This will print the message following:</p>
<pre><code class="lang-bash">I don<span class="hljs-string">'t have access to a function/tool that provides general company overviews. The available tool only supports calculating moving averages for stock data. Would you like me to help with a moving average calculation for Company A'</span>s stock instead?
</code></pre>
<p>This is where multi tool use comes into picture. However, we are not implementing multi tool in this demo.</p>
<h1 id="heading-stock-savvy-assistant-version-2">Stock-Savvy Assistant Version 2</h1>
<p>If you noticed, we did not use the data from company B at all.</p>
<p>Let us fix that by comparing the SMA for both companies. However, there’s a challenge:</p>
<ul>
<li><code>calculate_moving_average</code> and <code>CalculateMovingAverageArgs</code> accept <code>data_reference</code> not <code>available_data</code></li>
</ul>
<p>We can either fix the signatures or use Python <a target="_blank" href="https://docs.python.org/3/library/functools.html#functools.partial"><code>partial</code></a></p>
<p>No points for guessing, that we will use <code>partial</code> instead of repeating the same arg every where.</p>
<p>Here’s the updated version, with the following changes:</p>
<ol>
<li><p>Use dotenv to read API Keys</p>
</li>
<li><p>Calculate SMA of both company A and B</p>
</li>
<li><p>Implement partials to use the same tool function and args with <code>available_data</code></p>
</li>
</ol>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">from</span> collections <span class="hljs-keyword">import</span> deque
<span class="hljs-keyword">from</span> functools <span class="hljs-keyword">import</span> partial
<span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> Literal, List, Dict

<span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">from</span> cerebras.cloud.sdk <span class="hljs-keyword">import</span> Cerebras
<span class="hljs-keyword">from</span> dotenv <span class="hljs-keyword">import</span> load_dotenv
<span class="hljs-keyword">from</span> pydantic <span class="hljs-keyword">import</span> BaseModel, Field

load_dotenv()

v_api_key = os.environ.get(<span class="hljs-string">"STOCK_API_KEY"</span>)
<span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> v_api_key:
    print(<span class="hljs-string">"Error: STOCK_API_KEY environment variable not set."</span>)

STOCK_URLS = {
    <span class="hljs-string">"company_a"</span>: <span class="hljs-string">f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&amp;symbol=AAPL&amp;outputsize=full&amp;apikey=<span class="hljs-subst">{v_api_key}</span>"</span>,
    <span class="hljs-string">"company_b"</span>: <span class="hljs-string">f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&amp;symbol=GOOG&amp;outputsize=full&amp;apikey=<span class="hljs-subst">{v_api_key}</span>"</span>,
}

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_stocks_data</span>(<span class="hljs-params">url: str</span>) -&gt; List[Dict]:</span>
    <span class="hljs-string">"""
    Fetches and processes daily time series stock data from a given Alpha Vantage URL.
    ... (docstring remains the same) ...
    """</span>
    response = requests.get(url)
    response.raise_for_status()
    data = response.json()

    time_series_data = data.get(<span class="hljs-string">"Time Series (Daily)"</span>)
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> time_series_data:
        <span class="hljs-comment"># Add the API response note to the error for better debugging</span>
        note = data.get(<span class="hljs-string">"Note"</span>)
        <span class="hljs-keyword">if</span> note:
            <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">f"API response missing 'Time Series (Daily)' data. Note: <span class="hljs-subst">{note}</span>"</span>)
        <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">"API response missing 'Time Series (Daily)' data."</span>)

    stocks_data = []
    <span class="hljs-keyword">for</span> date_str, daily_data <span class="hljs-keyword">in</span> time_series_data.items():
        record = {<span class="hljs-string">"date"</span>: date_str}
        record.update(daily_data)
        stocks_data.append(record)

    stocks_data.reverse()
    <span class="hljs-keyword">return</span> stocks_data


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">calculate_moving_average</span>(<span class="hljs-params">
    available_data: Dict,  <span class="hljs-comment"># &lt;-- Dependency is now an explicit argument</span>
    data_reference: str,
    num_days: int,
    window_size: int,
</span>) -&gt; List[Dict[str, float]]:</span>
    <span class="hljs-string">"""
    Calculates the moving average for a specified stock over a given number of days
    and window size.

    Args:
        available_data: A dictionary containing the pre-loaded stock data.
        ... (other args remain the same) ...
    """</span>
    <span class="hljs-keyword">if</span> data_reference <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> available_data:
        <span class="hljs-keyword">raise</span> ValueError(
            <span class="hljs-string">f"Invalid data reference. Available options: <span class="hljs-subst">{list(available_data.keys())}</span>"</span>
        )

    stock_data = available_data[data_reference]

    <span class="hljs-keyword">if</span> num_days &lt; window_size:
        <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">"num_days must be greater than or equal to window_size"</span>)

    <span class="hljs-keyword">if</span> len(stock_data) &lt; num_days:
        <span class="hljs-keyword">raise</span> ValueError(
            <span class="hljs-string">f"Insufficient data for the specified number of days. "</span>
            <span class="hljs-string">f"Requested: <span class="hljs-subst">{num_days}</span>, Available: <span class="hljs-subst">{len(stock_data)}</span>"</span>
        )

    recent_data = stock_data[-num_days:]
    moving_averages = []
    price_window = deque(maxlen=window_size)

    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(recent_data)):
        current_data = recent_data[i]
        price_window.append(float(current_data[<span class="hljs-string">"4. close"</span>]))

        <span class="hljs-keyword">if</span> len(price_window) == window_size:
            average = round(sum(price_window) / window_size, <span class="hljs-number">2</span>)
            moving_averages.append(
                {<span class="hljs-string">"date"</span>: current_data[<span class="hljs-string">"date"</span>], <span class="hljs-string">"movingAverage"</span>: average}
            )

    <span class="hljs-keyword">return</span> moving_averages


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compare_moving_averages</span>(<span class="hljs-params">
    available_data: Dict, num_days: int, window_size: int
</span>) -&gt; Dict:</span>
    <span class="hljs-string">"""
    Calculates and compares the latest moving average of two companies.

    Args:
        available_data: A dictionary containing the pre-loaded stock data.
        ... (other args remain the same) ...
    """</span>
    <span class="hljs-keyword">try</span>:
        <span class="hljs-comment"># Pass the dependency down to the next function</span>
        ma_a_list = calculate_moving_average(
            available_data, <span class="hljs-string">"company_a"</span>, num_days, window_size
        )
        ma_b_list = calculate_moving_average(
            available_data, <span class="hljs-string">"company_b"</span>, num_days, window_size
        )

        latest_ma_a = ma_a_list[<span class="hljs-number">-1</span>] <span class="hljs-keyword">if</span> ma_a_list <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>
        latest_ma_b = ma_b_list[<span class="hljs-number">-1</span>] <span class="hljs-keyword">if</span> ma_b_list <span class="hljs-keyword">else</span> <span class="hljs-literal">None</span>

        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> latest_ma_a <span class="hljs-keyword">or</span> <span class="hljs-keyword">not</span> latest_ma_b:
            <span class="hljs-keyword">return</span> {
                <span class="hljs-string">"error"</span>: <span class="hljs-string">"Could not calculate moving average for one or both companies."</span>
            }

        result = {
            <span class="hljs-string">"company_a"</span>: {
                <span class="hljs-string">"latest_date"</span>: latest_ma_a[<span class="hljs-string">"date"</span>],
                <span class="hljs-string">"moving_average"</span>: latest_ma_a[<span class="hljs-string">"movingAverage"</span>],
            },
            <span class="hljs-string">"company_b"</span>: {
                <span class="hljs-string">"latest_date"</span>: latest_ma_b[<span class="hljs-string">"date"</span>],
                <span class="hljs-string">"moving_average"</span>: latest_ma_b[<span class="hljs-string">"movingAverage"</span>],
            },
            <span class="hljs-string">"summary"</span>: (
                <span class="hljs-string">f"As of their latest data points, Company A's <span class="hljs-subst">{window_size}</span>-day moving average is "</span>
                <span class="hljs-string">f"<span class="hljs-subst">{latest_ma_a[<span class="hljs-string">'movingAverage'</span>]}</span> (<span class="hljs-subst">{latest_ma_a[<span class="hljs-string">'date'</span>]}</span>), while Company B's is "</span>
                <span class="hljs-string">f"<span class="hljs-subst">{latest_ma_b[<span class="hljs-string">'movingAverage'</span>]}</span> (<span class="hljs-subst">{latest_ma_b[<span class="hljs-string">'date'</span>]}</span>)."</span>
            ),
        }
        <span class="hljs-keyword">return</span> result
    <span class="hljs-keyword">except</span> ValueError <span class="hljs-keyword">as</span> e:
        <span class="hljs-keyword">return</span> {<span class="hljs-string">"error"</span>: str(e)}


<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CalculateMovingAverageArgs</span>(<span class="hljs-params">BaseModel</span>):</span>
    <span class="hljs-string">"""Defines arguments for the calculate_moving_average function."""</span>
    data_reference: Literal[<span class="hljs-string">"company_a"</span>, <span class="hljs-string">"company_b"</span>] = Field(..., description=<span class="hljs-string">"The key for the stock data."</span>)
    num_days: int = Field(..., description=<span class="hljs-string">"The number of recent days for calculation."</span>)
    window_size: int = Field(..., description=<span class="hljs-string">"The size of the moving average window."</span>)


<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CompareMovingAveragesArgs</span>(<span class="hljs-params">BaseModel</span>):</span>
    <span class="hljs-string">"""Defines arguments for the compare_moving_averages function."""</span>
    num_days: int = Field(..., description=<span class="hljs-string">"The number of recent days for calculation."</span>)
    window_size: int = Field(..., description=<span class="hljs-string">"The size of the moving average window."</span>)


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    <span class="hljs-string">"""
    Main function to run the financial analyst assistant.
    """</span>
    print(<span class="hljs-string">"Fetching stock data..."</span>)
    loaded_data = {key: get_stocks_data(url) <span class="hljs-keyword">for</span> key, url <span class="hljs-keyword">in</span> STOCK_URLS.items()}
    print(<span class="hljs-string">"Data loaded successfully."</span>)

    calculate_moving_average_tool = partial(
        calculate_moving_average, available_data=loaded_data
    )
    compare_moving_averages_tool = partial(
        compare_moving_averages, available_data=loaded_data
    )

    tools = [
        {
            <span class="hljs-string">"type"</span>: <span class="hljs-string">"function"</span>,
            <span class="hljs-string">"function"</span>: {
                <span class="hljs-string">"name"</span>: <span class="hljs-string">"calculate_moving_average"</span>,
                <span class="hljs-string">"description"</span>: <span class="hljs-string">"Calculate the moving average of a single company's stock data."</span>,
                <span class="hljs-string">"parameters"</span>: CalculateMovingAverageArgs.model_json_schema(),
            },
        },
        {
            <span class="hljs-string">"type"</span>: <span class="hljs-string">"function"</span>,
            <span class="hljs-string">"function"</span>: {
                <span class="hljs-string">"name"</span>: <span class="hljs-string">"compare_moving_averages"</span>,
                <span class="hljs-string">"description"</span>: <span class="hljs-string">"Compare the moving averages of company A and company B."</span>,
                <span class="hljs-string">"parameters"</span>: CompareMovingAveragesArgs.model_json_schema(),
            },
        },
    ]

    available_functions = {
        <span class="hljs-string">"calculate_moving_average"</span>: calculate_moving_average_tool,
        <span class="hljs-string">"compare_moving_averages"</span>: compare_moving_averages_tool,
    }

    messages = [
        {
            <span class="hljs-string">"role"</span>: <span class="hljs-string">"system"</span>,
            <span class="hljs-string">"content"</span>: <span class="hljs-string">"You are a helpful financial analyst. Use the supplied tools to assist the user. "</span>
                       <span class="hljs-string">"When comparing, state which company has the higher moving average and by how much."</span>,
        },
        {
            <span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>,
            <span class="hljs-string">"content"</span>: <span class="hljs-string">"How do the 10-day moving averages for company A and company B compare over the last 50 days?"</span>,
        },
    ]

    ai_api_key = os.environ.get(<span class="hljs-string">"CEREBRAS_API_KEY"</span>)
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> ai_api_key:
        print(<span class="hljs-string">"Error: CEREBRAS_API_KEY environment variable not set."</span>)
        <span class="hljs-keyword">return</span>

    client = Cerebras(api_key=ai_api_key)

    print(<span class="hljs-string">"\n--- Sending request to LLM ---"</span>)
    response = client.chat.completions.create(
        model=<span class="hljs-string">"qwen-3-32b"</span>,
        messages=messages,
        tools=tools,
        tool_choice=<span class="hljs-string">"auto"</span>,
    )

    response_message = response.choices[<span class="hljs-number">0</span>].message

    <span class="hljs-keyword">if</span> response_message.tool_calls:
        print(<span class="hljs-string">"LLM decided to use a tool."</span>)
        tool_call = response_message.tool_calls[<span class="hljs-number">0</span>]
        function_name = tool_call.function.name

        <span class="hljs-keyword">if</span> function_name <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> available_functions:
            print(<span class="hljs-string">f"Error: LLM tried to call an unknown function: <span class="hljs-subst">{function_name}</span>"</span>)
            <span class="hljs-keyword">return</span>

        print(<span class="hljs-string">f"Calling function: <span class="hljs-subst">{function_name}</span>"</span>)
        function_to_call = available_functions[function_name]

        <span class="hljs-keyword">try</span>:
            arguments = json.loads(tool_call.function.arguments)
            tool_result = function_to_call(**arguments)

            print(<span class="hljs-string">"\n--- Tool Result ---"</span>)
            print(json.dumps(tool_result, indent=<span class="hljs-number">2</span>))

            print(<span class="hljs-string">"\n--- Sending tool result back to LLM for final answer ---"</span>)
            messages.append(response_message)
            messages.append(
                {
                    <span class="hljs-string">"tool_call_id"</span>: tool_call.id,
                    <span class="hljs-string">"role"</span>: <span class="hljs-string">"tool"</span>,
                    <span class="hljs-string">"name"</span>: function_name,
                    <span class="hljs-string">"content"</span>: json.dumps(tool_result),
                }
            )

            final_response = client.chat.completions.create(
                model=<span class="hljs-string">"qwen-3-32b"</span>,
                messages=messages,
            )

            final_answer = final_response.choices[<span class="hljs-number">0</span>].message.content
            print(<span class="hljs-string">"\n--- Final AI Analyst Answer ---"</span>)
            print(final_answer)

        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            print(<span class="hljs-string">f"An error occurred during tool execution: <span class="hljs-subst">{e}</span>"</span>)
    <span class="hljs-keyword">else</span>:
        print(<span class="hljs-string">"\n--- Direct LLM Response ---"</span>)
        print(response_message.content)


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<p>Here’s a sample response:</p>
<pre><code class="lang-plaintext">
Data loaded successfully.

--- Sending request to LLM ---
LLM decided to use a tool.
Calling function: compare_moving_averages

--- Tool Result ---
{
  "company_a": {
    "latest_date": "2025-06-20",
    "moving_average": 199.41
  },
  "company_b": {
    "latest_date": "2025-06-20",
    "moving_average": 176.11
  },
  "summary": "As of their latest data points, Company A's 10-day moving average is 199.41 (2025-06-20), while Company B's is 176.11 (2025-06-20)."
}

--- Sending tool result back to LLM for final answer ---

--- Final AI Analyst Answer ---
&lt;think&gt;
Okay, let me start by understanding the user's question. They want to compare the 10-day moving averages of company A and company B over the last 50 days. The tool response provided data for both companies as of June 20, 2025.

First, I need to verify if the data matches the query. The user asked for a 50-day comparison, but the tool's summary only gives the moving average as of the latest date. The answer should confirm whether the data spans the last 50 days correctly. Since the latest date is 2025-06-20 and the moving average is based on the current data up to that point, it's likely that the moving average reflects the closing prices over a period that includes the last 50 days. However, the exact comparison over the 50-day window isn't detailed in the tool's response beyond the current snapshot.

The key points here are the latest moving averages. The user asked for a comparison over the last 50 days, but the tool's output only shows the most recent numbers. This might mean that the user is interested in the current state of the moving averages, perhaps to assess recent performance trends. 

Next, I should calculate the difference between the two moving averages. Company A's is 199.41 and Company B's is 176.11. Subtracting these gives 23.3 points. To make it more meaningful, I can express this difference as a percentage of the lower moving average (Company B's). The percentage difference is (23.3 / 176.11) * 100 ≈ 13.23%. That shows how much higher Company A's moving average is in relative terms. 

I should also consider if there's any missing information. The user might expect a time series comparison or a trend analysis over the 50 days, but since the tool's response only provides the latest values, I need to clarify that my answer is based on a single point in time. It's possible that the 10-day moving averages fluctuated over the past 50 days, but without historical data, I can only state the current difference. 

Therefore, the response should highlight the latest moving averages, the absolute difference, and the percentage difference. I should also mention that the data is as of June 20, 2025, and note that a more detailed 50-day trend would require additional historical data, which isn't provided here.
&lt;/think&gt;

**Comparison of 10-day Moving Averages:**

- **Company A**: 199.41 (as of 2025-06-20)  
- **Company B**: 176.11 (as of 2025-06-20)  

**Conclusion**:  
Company A's 10-day moving average is **$23.30 higher** than Company B's, representing a **13.23% increase** relative to Company B's value. This comparison reflects the latest data point for both companies, but historical trends over the 50-day period would require a time-series analysis (not included in the provided data).

Process finished with exit code 0
</code></pre>
<h1 id="heading-why-use-tools-with-ai">Why use tools with AI?</h1>
<ul>
<li><p>Access to Real-Time Data: LLMs can query live APIs for up-to-the-minute information.</p>
</li>
<li><p>Interacting with External Systems: Control databases, send emails, or interact with any system that has an API.</p>
</li>
<li><p>Performing Complex Calculations: Offload specialized computations to dedicated functions, like our SMA calculator.</p>
</li>
<li><p>Improved Accuracy and Reliability: Ground LLM responses in factual data retrieved by tools, rather than relying solely on its training.</p>
</li>
</ul>
<p>By integrating tools with the Cerebras SDK, you can build more dynamic, capable, and reliable AI applications. This example is just the tip of the iceberg – the possibilities are vast!</p>
]]></content:encoded></item><item><title><![CDATA[Paint Your I/O Picture with Fio and Plots]]></title><description><![CDATA[Fio: The chef’s knife
fio (Flexible I/O Tester) is a versatile I/O workload generator that is used to benchmark and test storage systems. Developed by Jens Axboe, fio is capable of simulating various I/O workloads, making it a valuable tool for evalu...]]></description><link>https://code.manas.me/paint-your-io-picture-with-fio-and-plots</link><guid isPermaLink="true">https://code.manas.me/paint-your-io-picture-with-fio-and-plots</guid><category><![CDATA[storage]]></category><category><![CDATA[plotly]]></category><category><![CDATA[Dash]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Sat, 08 Mar 2025 14:15:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/-WXQm_NTK0U/upload/5429362008aa8b69d94cc561534a6c13.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-fio-the-chefs-knife">Fio: The chef’s knife</h1>
<p><a target="_blank" href="https://fio.readthedocs.io/en/latest/index.html">fio</a> (Flexible I/O Tester) is a versatile I/O workload generator that is used to benchmark and test storage systems. Developed by Jens Axboe, fio is capable of simulating various I/O workloads, making it a valuable tool for evaluating the performance of hard drives, SSDs, and other storage solutions. fio can be configured to perform read, write, and mixed operations with different block sizes, I/O depths, and access patterns. It supports a wide range of I/O engines, including synchronous, asynchronous, and memory-mapped I/O.</p>
<p>Now, think of fio as a meticulously sharpened chef's knife. In the hands of a master, it can dice, slice, and julienne storage performance with unparalleled precision. It's capable of revealing the most granular details of your disks, SSDs, and network drives. But, just like a professional knife set, fio comes with a dizzying array of blades – options, in its case. This sheer versatility, while powerful, can be utterly overwhelming to the uninitiated. You're presented with a vast toolkit, and knowing which tool to use, let alone how to wield it effectively, is a challenge in itself. Prepare to navigate a sea of parameters, from block sizes and I/O engines to queue depths and latency targets, or risk simply blunting the edge of this incredibly potent instrument.</p>
<p>Based on the FIO documentation <a target="_blank" href="https://fio.readthedocs.io/en/latest/fio_doc.html">from FIO documen</a><a target="_blank" href="https://fio.readthedocs.io/en/latest/fio_doc.html">tation</a>, there are at least <strong>152</strong> command-line options ( as calculated by a Gemini)</p>
<h1 id="heading-plot-using-the-packaged-script">Plot using the packaged script</h1>
<p>The <a target="_blank" href="https://github.com/axboe/fio/blob/master/tools/fio_generate_plots"><code>fio_generate_plots</code></a> script is a utility that processes the log files generated by fio and creates graphical representations of the data using GNUPLOT. The script generates plots in the SVG (Scalable Vector Graphics) format, which is supported by most modern browsers and allows for resolution-independent graphs. This makes it easier to visualise and analyse the performance data collected during FIO benchmarks.</p>
<p>However, to create log file, certain options should be used: <code>write_lat_log</code>, <code>write_bw_log</code>, and <code>write_iops_log</code>. Here's a sample FIO workload configuration file:</p>
<pre><code class="lang-ini"><span class="hljs-section">[global]</span>
<span class="hljs-attr">ioengine</span>=libaio
<span class="hljs-attr">direct</span>=<span class="hljs-number">1</span>
<span class="hljs-attr">bs</span>=<span class="hljs-number">4</span>k
<span class="hljs-attr">size</span>=<span class="hljs-number">1</span>G
<span class="hljs-attr">runtime</span>=<span class="hljs-number">60</span>
time_based
<span class="hljs-attr">ramp_time</span>=<span class="hljs-number">10</span>s
group_reporting
<span class="hljs-attr">numjobs</span>=<span class="hljs-number">4</span>
<span class="hljs-attr">log_avg_msec</span>=<span class="hljs-number">1000</span>
<span class="hljs-attr">write_bw_log</span>=bw
<span class="hljs-attr">write_iops_log</span>=iops
<span class="hljs-attr">write_lat_log</span>=lat

<span class="hljs-section">[write-test]</span>
<span class="hljs-attr">rw</span>=write
<span class="hljs-attr">filename</span>=write_test_file

<span class="hljs-section">[read-test]</span>
<span class="hljs-attr">rw</span>=read
<span class="hljs-attr">filename</span>=read_test_file
</code></pre>
<p>To run this workload, save the above configuration to a file named <code>fio_workload_example.ini</code>, and then execute the following command:</p>
<pre><code class="lang-sh">fio fio_workload_example.ini
</code></pre>
<p>This will generate the following log files:</p>
<ul>
<li><p><code>bw_log</code></p>
</li>
<li><p><code>iops_log</code></p>
</li>
<li><p><code>lat_log</code></p>
</li>
</ul>
<p>These log files can then be used by the <code>fio_generate_plots</code> script to create the plots.</p>
<p>Now, to use the <code>fio_generate_plots</code> script to generate plots, follow these steps:</p>
<ol>
<li><p><strong>Ensure</strong> <code>gnuplot</code> is installed: The script requires <code>gnuplot</code> to generate graphs. You can install it using your package manager. For example, on Debian-based systems, you can use:</p>
<pre><code class="lang-plaintext"> sudo apt-get install gnuplot
</code></pre>
</li>
<li><p><strong>Run the script</strong>: Execute the script with the required parameters:</p>
<ul>
<li><p><code>subtitle</code>: The main title for the plots.</p>
</li>
<li><p><code>xres</code> (optional): The horizontal resolution of the plots.</p>
</li>
<li><p><code>yres</code> (optional): The vertical resolution of the plots.</p>
</li>
</ul>
</li>
</ol>
<p>    Example usage:</p>
<pre><code class="lang-plaintext">    ./fio_generate_plots "Benchmark Results" 1920 1080
</code></pre>
<p>    This will generate SVG plots with the specified resolution.</p>
<ol start="3">
<li><strong>Check the output</strong>: The script will generate SVG files in the current directory with names based on the provided subtitle and the type of data (e.g., <code>My Benchmark Results-lat.svg</code>, <code>My Benchmark Results-iops.svg</code>).</li>
</ol>
<p>Make sure the script has executable permissions. If not, you can set them using:</p>
<pre><code class="lang-plaintext">chmod +x fio_generate_plots
</code></pre>
<p>On my machine, it gives an output:</p>
<pre><code class="lang-bash">&gt; ls
Benchmark-bw.svg   Benchmark-clat.svg Benchmark-iops.svg Benchmark-lat.svg  Benchmark-slat.svg
</code></pre>
<h1 id="heading-generate-plots-using-pandas">Generate plots using Pandas</h1>
<p>Now, that we have the plots as SVG, the next logical step is to automate the process and create a friendly chart. Given that there are excellent Python packages for this purpose, let us move away from gnuplot to pandas.</p>
<p>Here are the steps:</p>
<ol>
<li><p>We read the file into a pandas data frame. Note that we are only using the first and second columns.</p>
</li>
<li><p>Collect all the data frames in a <code>list</code></p>
</li>
<li><p>Plot them using matplotlib and save as SVGs</p>
</li>
</ol>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> glob
<span class="hljs-keyword">import</span> sys

<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">usage</span>():</span>
    print(<span class="hljs-string">"Usage: fio_generate_plots.py subtitle [xres yres]"</span>)
    sys.exit(<span class="hljs-number">1</span>)


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">read_log_files</span>(<span class="hljs-params">filetype</span>):</span>
    <span class="hljs-string">"""
    Read fio log files into a pandas dataframe.
    :param filetype: string to search for filenames
    :return: list of pandas dataframes
    """</span>
    logs = []
    files = glob.glob(<span class="hljs-string">f"*_<span class="hljs-subst">{filetype}</span>.log"</span>) + glob.glob(<span class="hljs-string">f"*_<span class="hljs-subst">{filetype}</span>.*.log"</span>)
    print(<span class="hljs-string">f"Found <span class="hljs-subst">{len(files)}</span> <span class="hljs-subst">{filetype}</span> files"</span>)
    <span class="hljs-keyword">for</span> file <span class="hljs-keyword">in</span> files:
        df = pd.read_csv(file, header=<span class="hljs-literal">None</span>, names=[<span class="hljs-string">"time"</span>, <span class="hljs-string">"value"</span>, <span class="hljs-string">"x"</span>, <span class="hljs-string">"y"</span>, <span class="hljs-string">"z"</span>])
        df[<span class="hljs-string">"time"</span>] = (
            pd.to_numeric(df[<span class="hljs-string">"time"</span>], errors=<span class="hljs-string">"coerce"</span>) / <span class="hljs-number">1000</span>
        )  <span class="hljs-comment"># Convert time to seconds</span>
        logs.append((file, df))
    <span class="hljs-keyword">return</span> logs


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">plot</span>(<span class="hljs-params">title, filetype, ylabel, scale, xres=<span class="hljs-number">1280</span>, yres=<span class="hljs-number">768</span></span>):</span>
    <span class="hljs-string">"""
    Generate a plot for a given dataframe and store is as a SVG file.
    :param title: Title of the plot
    :param filetype: log filetype used
    :param ylabel: Y axis label
    :param scale: scaling factor
    :param xres: X axis resolution
    :param yres: Y axis resolution
    :return: None 
    """</span>
    logs = read_log_files(filetype)
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> logs:
        print(<span class="hljs-string">"No log files found"</span>)
        sys.exit(<span class="hljs-number">1</span>)

    plt.figure(figsize=(xres / <span class="hljs-number">100</span>, yres / <span class="hljs-number">100</span>))
    plt.title(<span class="hljs-string">f"<span class="hljs-subst">{title}</span>\n\n<span class="hljs-subst">{ylabel}</span>"</span>)
    plt.xlabel(<span class="hljs-string">"Time (sec)"</span>)
    plt.ylabel(ylabel)

    <span class="hljs-keyword">for</span> i, (filename, df) <span class="hljs-keyword">in</span> enumerate(logs):
        depth = filename.split(<span class="hljs-string">"."</span>)[<span class="hljs-number">1</span>]
        plt.plot(
            df[<span class="hljs-string">"time"</span>],
            df[<span class="hljs-string">"value"</span>] / scale,
            label=<span class="hljs-string">f"Queue depth <span class="hljs-subst">{depth}</span>"</span>,
            linestyle=<span class="hljs-string">"-"</span>,
            marker=<span class="hljs-string">""</span>,
        )

    plt.legend(loc=<span class="hljs-string">"best"</span>)
    plt.grid(<span class="hljs-literal">True</span>)
    plt.savefig(<span class="hljs-string">f"<span class="hljs-subst">{title.replace(<span class="hljs-string">' '</span>, <span class="hljs-string">'_'</span>)}</span>_<span class="hljs-subst">{filetype}</span>.svg"</span>)


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    <span class="hljs-keyword">if</span> len(sys.argv) &lt; <span class="hljs-number">2</span>:
        usage()

    title = sys.argv[<span class="hljs-number">1</span>]
    xres = int(sys.argv[<span class="hljs-number">2</span>]) <span class="hljs-keyword">if</span> len(sys.argv) &gt; <span class="hljs-number">2</span> <span class="hljs-keyword">else</span> <span class="hljs-number">1280</span>
    yres = int(sys.argv[<span class="hljs-number">3</span>]) <span class="hljs-keyword">if</span> len(sys.argv) &gt; <span class="hljs-number">3</span> <span class="hljs-keyword">else</span> <span class="hljs-number">768</span>

    <span class="hljs-comment"># One plot for each log type </span>
    plot(title, <span class="hljs-string">"lat"</span>, <span class="hljs-string">"Time (msec)"</span>, <span class="hljs-number">1000000</span>, xres, yres)
    plot(title, <span class="hljs-string">"iops"</span>, <span class="hljs-string">"IOPS"</span>, <span class="hljs-number">1</span>, xres, yres)
    plot(title, <span class="hljs-string">"slat"</span>, <span class="hljs-string">"Time (μsec)"</span>, <span class="hljs-number">1000</span>, xres, yres)
    plot(title, <span class="hljs-string">"clat"</span>, <span class="hljs-string">"Time (msec)"</span>, <span class="hljs-number">1000000</span>, xres, yres)
    plot(title, <span class="hljs-string">"bw"</span>, <span class="hljs-string">"Throughput (KB/s)"</span>, <span class="hljs-number">1</span>, xres, yres)


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    main()
</code></pre>
<h1 id="heading-add-a-dash-of-plot">Add a dash of plot</h1>
<p>Now that we have figured out how to generate SVGs, let us take it a notch higher by creating a web app using <code>Dash</code>. This gives us the ability to create a dashboard and load data on demand without having to re-run scripts.</p>
<p>Let us re-use the <code>read_logs_files</code> method and replace <code>matplotlib</code> with <code>plotly</code></p>
<p>Finally, wrap it in a <code>dash</code> app.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> glob

<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> plotly.graph_objects <span class="hljs-keyword">as</span> go
<span class="hljs-keyword">from</span> dash <span class="hljs-keyword">import</span> Dash, html, dcc, callback, Output, Input


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">read_log_files</span>(<span class="hljs-params">filetype</span>):</span>
    logs = []
    files = glob.glob(<span class="hljs-string">f"*_<span class="hljs-subst">{filetype}</span>.log"</span>) + glob.glob(<span class="hljs-string">f"*_<span class="hljs-subst">{filetype}</span>.*.log"</span>)
    print(<span class="hljs-string">f"Found <span class="hljs-subst">{len(files)}</span> <span class="hljs-subst">{filetype}</span> files"</span>)
    <span class="hljs-keyword">for</span> file <span class="hljs-keyword">in</span> files:
        df = pd.read_csv(file, header=<span class="hljs-literal">None</span>, names=[<span class="hljs-string">"time"</span>, <span class="hljs-string">"value"</span>, <span class="hljs-string">"x"</span>, <span class="hljs-string">"y"</span>, <span class="hljs-string">"z"</span>])
        df[<span class="hljs-string">"time"</span>] = (
            pd.to_numeric(df[<span class="hljs-string">"time"</span>], errors=<span class="hljs-string">"raise"</span>) / <span class="hljs-number">1000</span>
        )  <span class="hljs-comment"># Convert time to seconds</span>
        df.attrs[<span class="hljs-string">"name"</span>] = <span class="hljs-string">"Queue Depth"</span> + file.split(<span class="hljs-string">"."</span>)[<span class="hljs-number">1</span>]
        logs.append(df)
    <span class="hljs-keyword">return</span> logs


app = Dash()

options = [<span class="hljs-string">"lat"</span>, <span class="hljs-string">"bw"</span>, <span class="hljs-string">"iops"</span>, <span class="hljs-string">"clat"</span>]
app.layout = [
    html.H1(children=<span class="hljs-string">"Benchmark"</span>, style={<span class="hljs-string">"textAlign"</span>: <span class="hljs-string">"center"</span>}),
    dcc.Graph(id=<span class="hljs-string">"graph-content"</span>),
    <span class="hljs-comment"># Select option from the dropdown</span>
    dcc.Dropdown(options, <span class="hljs-string">"iops"</span>, id=<span class="hljs-string">"dropdown-selection"</span>),
]


<span class="hljs-meta">@callback(Output("graph-content", "figure"), Input("dropdown-selection", "value"))</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">update_graph</span>(<span class="hljs-params">value</span>):</span>
    list_of_dfs = read_log_files(value)
    fig_combined = go.Figure()
    <span class="hljs-keyword">for</span> df <span class="hljs-keyword">in</span> list_of_dfs:
        <span class="hljs-keyword">for</span> col <span class="hljs-keyword">in</span> df.columns:
            <span class="hljs-keyword">if</span> col == <span class="hljs-string">"value"</span>:  <span class="hljs-comment"># Value</span>
                fig_combined.add_trace(
                    go.Scatter(
                        x=df[<span class="hljs-string">"time"</span>], y=df[col], mode=<span class="hljs-string">"lines"</span>, name=df.attrs[<span class="hljs-string">"name"</span>]
                    )
                )
    fig_combined.update_layout(title_text=<span class="hljs-string">"Benchmark"</span>)

    <span class="hljs-keyword">return</span> fig_combined


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    app.run(debug=<span class="hljs-literal">True</span>)
</code></pre>
<p>Here’s the app with a basic dropdown:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1741443044175/16032079-df19-4cf0-8130-a1d1a92780b7.png" alt class="image--center mx-auto" /></p>
<p>As you select the option, the app generates plot and displays them automatically.</p>
<p>That’s it for our fio plot app powered by Python!</p>
]]></content:encoded></item><item><title><![CDATA[How to Use Deepseek with a Private Ollama Server]]></title><description><![CDATA[Introduction
With the power of Ollama and the ease of Tailscale VPN, you can host a private AI model server that is both accessible and secure, allowing for local chat, code assistance, and remote access from mobile devices.
Ollama
Let's start by ins...]]></description><link>https://code.manas.me/run-deepseek-from-a-private-ollama-server</link><guid isPermaLink="true">https://code.manas.me/run-deepseek-from-a-private-ollama-server</guid><category><![CDATA[Deepseek]]></category><category><![CDATA[tailscale]]></category><category><![CDATA[AI]]></category><category><![CDATA[#ai-tools]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Wed, 29 Jan 2025 15:25:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/ZiQkhI7417A/upload/95d35dc7a8e7d77a66d778cf87caa08a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction"><strong>Introduction</strong></h1>
<p>With the power of Ollama and the ease of Tailscale VPN, you can host a private AI model server that is both accessible and secure, allowing for local chat, code assistance, and remote access from mobile devices.</p>
<h1 id="heading-ollama"><strong>Ollama</strong></h1>
<p>Let's start by installing Ollama, which helps serve LLM models. Install <a target="_blank" href="https://github.com/ollama/ollama/blob/main/README.md#quickstart">Ollama</a> on the server machine using the official installer or system package managers. Then, download the models that are compatible with the machine. The latest <a target="_blank" href="https://ollama.com/library/deepseek-r1">Deepseek-R1</a> models, 7B and 8B, require 16 GB of memory and can run on a MacBook Pro. After installation, make sure Ollama is working:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Choose the model that can run on your hardware</span>
ollama run deepseek-r1:7b
</code></pre>
<h1 id="heading-local-ai-code-assitant">Local AI code assitant</h1>
<p>With the Ollama server running, you can use it for code assistance. You can use the open-source <a target="_blank" href="https://www.continue.dev/">Continue</a> plugin for VSCode or IntelliJ IDEs. To use the local LLM, you need to configure the <code>config.json</code>.</p>
<p>Interestingly, you can use Deepseek to modify the config JSON itself! Set the context with the config.json, the Deepseek model, and then prompt:</p>
<p><code>update file to use ollama with deepseek</code></p>
<p>Here’s a screenshot of the chat</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738176420474/7a361f5b-a101-4fb4-8c5e-dccc6fff8d54.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-tailscale"><strong>Tailscale</strong></h1>
<p>Tailscale allows you to create software defined networks. This is a secure alternative to something like ngrok. Install Tailscale on the server and clients using the preferred method (e.g., Homebrew for macOS or package managers for Linux/Windows) or <a target="_blank" href="https://apps.apple.com/us/app/tailscale/id1470499037">App Store</a>. Login with your preferred auth service, e.g. Google is supported.</p>
<h2 id="heading-connect-devices"><strong>Connect Devices</strong></h2>
<p>If you do not want to use CLI, just visit the <a target="_blank" href="https://login.tailscale.com/admin/machines">admin console</a> to add devices. There are apps available for iOS, Android and other OS.</p>
<p>Remember to note down the IP or DNS hostname of the server machine and mobile devices.</p>
<h2 id="heading-configure-ollama-to-use-tailscale-server"><strong>Configure Ollama to Use Tailscale Server</strong></h2>
<pre><code class="lang-bash"><span class="hljs-comment"># MacOS example </span>
launchctl setenv OLLAMA_HOST <span class="hljs-string">"&lt;tailscale hostname&gt;"</span>
<span class="hljs-comment"># Adjust port numbers as necessary based on your setup.</span>
</code></pre>
<h2 id="heading-access-models-across-devices"><strong>Access Models Across Devices</strong></h2>
<p>Ensure ollama is running:</p>
<pre><code class="lang-bash">ollama serve
</code></pre>
<p>Now, install ollama compatible app on the mobile device. <a target="_blank" href="https://github.com/gluonfield/enchanted">Enchanted</a> works well for iOS. In the settings, set ollama server as the tailscal IP/Hostname with the port 11434</p>
<p>Now, you can use the ollama model from the private server on the mobile device.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738176472741/3fa9c454-5740-4cb9-8964-6c55adaa42ce.png" alt class="image--right mx-auto mr-0" /></p>
]]></content:encoded></item><item><title><![CDATA[Managing Python Versions and Packages]]></title><description><![CDATA[Do not rely on system Python
Getting started with Python should be easy, right? After all, most OS ship with a version of Python. However, most OS ship with a specific version on Python which they rely on to run services and scripts. The installed ve...]]></description><link>https://code.manas.me/managing-python-versions-and-packages</link><guid isPermaLink="true">https://code.manas.me/managing-python-versions-and-packages</guid><category><![CDATA[Python]]></category><category><![CDATA[Poetry]]></category><category><![CDATA[dependency management]]></category><category><![CDATA[Beginner Developers]]></category><category><![CDATA[best practices]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Sat, 09 Dec 2023 09:45:46 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/QUwM2LDVs3A/upload/2a7030b9128acf68f367a2838e201d32.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-do-not-rely-on-system-python">Do not rely on system Python</h1>
<p>Getting started with Python should be easy, right? After all, most OS ship with a version of Python. However, most OS ship with a <em>specific</em> version on Python which they rely on to run services and scripts. The installed version may change after upgrades. It is recommended to <em>not</em> install packages system-wide. System Python is best managed by the OS.</p>
<h1 id="heading-package-managers-are-cool">Package Managers are cool</h1>
<p>How should you install the Python version needed for development? Most OS ship with a package manager. Linux variants have <code>yum</code>, <code>apt</code>, <code>dnf</code> etc. MacOS users can install <code>brew</code> . Package managers fetch info from a repository and install the right version based on the OS variant, version and system architecture. This is a good way to start but it gets complicated to manage <em>multiple</em> Python versions. You may need different version to test compatibility or try new language features.</p>
<p>The download section for Python 3.12 lists 9 files. 2 are compressed source code and rest are installers covering major OS and architectures:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Version</strong></td><td><strong>Operating System</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Gzipped source tarball</td><td>Source release</td></tr>
<tr>
<td>XZ compressed source tarball</td><td>Source release</td></tr>
<tr>
<td>macOS 64-bit universal2 installer</td><td>macOS</td></tr>
<tr>
<td>Windows embeddable package (32-bit)</td><td>Windows</td></tr>
<tr>
<td>Windows embeddable package (64-bit)</td><td>Windows</td></tr>
<tr>
<td>Windows embeddable package (ARM64)</td><td>Windows</td></tr>
<tr>
<td>Windows installer (32 -bit)</td><td>Windows</td></tr>
<tr>
<td>Windows installer (64-bit)</td><td>Windows</td></tr>
<tr>
<td>Windows installer (ARM64)</td><td>Windows</td></tr>
</tbody>
</table>
</div><p>Notice that there is no <strong>installer</strong> for Linux. Perhaps, creating a universal installer for Windows and MacOS is easier. Linux distributions maintain their Python packages which are built from the source releases. Package managers can only install the versions available in the repository. This is important so that users only install stable versions.</p>
<h1 id="heading-there-must-be-a-better-way">There must be a better way</h1>
<p>To install a dev or beta version, you either compile it from source or use a non-official package repo. You would agree that a single way to install released/dev versions would be convenient. One way to do so is using <code>pyenv</code> .</p>
<h2 id="heading-pyenv">Pyenv</h2>
<p>Pyenv helps install multiple Python versions (including release candidates, dev) and different implementations like pypy, stackless, pyston, etc. Start with the installation instructions <a target="_blank" href="https://github.com/pyenv/pyenv">here</a> which depends on OS.</p>
<h3 id="heading-usage">Usage</h3>
<p>Once installed, you can search the available versions like this:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Using 3.12 as an example as it has the least options right now</span>
pyenv install -l | grep 3.12
  3.12.0rc2
  3.12-dev
  pypy2.7-7.3.12-src
  pypy2.7-7.3.12
  pypy3.9-7.3.12-src
  pypy3.9-7.3.12
  pypy3.10-7.3.12-src
  pypy3.10-7.3.12
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">To install Python for maximum <a target="_blank" href="https://github.com/pyenv/pyenv/blob/master/plugins/python-build/README.md#building-for-maximum-performance">performance</a>, use the following environment variables: <code>env PYTHON_CONFIGURE_OPTS='--enable-optimizations --with-lto' PYTHON_CFLAGS='-march=native -mtune=native' pyenv install &lt;version&gt;</code></div>
</div>

<p>Other common commands are:</p>
<ul>
<li><p><code>pyenv versions</code> which lists installed versions</p>
</li>
<li><p><code>pyenv global</code> lets you manage the global Python version</p>
</li>
<li><p><code>pyenv update</code> works on Linux. On MacOS, this can be done with <code>brew update</code></p>
</li>
</ul>
<p>Removal is as easy as <code>rm -rf ~/.pyenv/versions/"X.Y.Z"</code> (Use <code>-rf</code> with caution). Less complicated than using system package manager to find and remove older Python versions.</p>
<h2 id="heading-troubleshooting">Troubleshooting</h2>
<p>Pyenv relies on the sources and compiles them on your machine. You may get errors in compilation if the required OS packages are not found. This is opposite of the convenience promised. This is the reason we discussed package managers and the challenges associated at the start. If you run into an issue, refer to <a target="_blank" href="https://github.com/pyenv/pyenv/wiki/Common-build-problems#prerequisites">common build problems</a>.</p>
<h1 id="heading-dependency-management">Dependency Management</h1>
<h2 id="heading-pip-problems">Pip Problems</h2>
<p>Once you have a Python version installed, you need to manage projects. There are two problem:</p>
<ol>
<li><p>Install a project specific Python version.</p>
</li>
<li><p>Manage the package dependencies.</p>
</li>
</ol>
<p>There are many ways to create virtual environments, the simplest being <code>python -m venv &lt;name&gt;</code> . Dependencies can be managed by <code>pip</code> using a <code>requirements.txt</code> file. These are part of the standard Python library.</p>
<p>Things get complicated as the size of project and its dependencies increases. Most packages depend on other packages which are installed with them. Say, if you install PackageA, and it installs PackageB, PackageB. <code>pip freeze</code> will list all the installed packages. There's no easy way to separate top-level packages and you end up managing the <code>requirements.txt</code> file manually. Similarly, when packages need to be updated, there may be conflicts. As with all things Python, there are multiple solutions to <a target="_blank" href="https://packaging.python.org/en/latest/tutorials/managing-dependencies/">managing dependencies</a>.</p>
<h2 id="heading-poetic-solutions">Poetic Solutions</h2>
<p>Poetry, also, solves these issues and provides a way to quick start new projects.</p>
<pre><code class="lang-bash">poetry new demo
Created package demo <span class="hljs-keyword">in</span> demo
➜  demo <span class="hljs-built_in">cd</span> demo 
➜  demo ls -lh
total 8
-rw-r--r--  1 manas  staff     0B Dec  9 14:14 README.md
drwxr-xr-x  3 manas  staff    96B Dec  9 14:14 demo
-rw-r--r--  1 manas  staff   257B Dec  9 14:14 pyproject.toml
drwxr-xr-x  3 manas  staff    96B Dec  9 14:14 tests
➜  demo cat pyproject.toml 
[tool.poetry]
name = <span class="hljs-string">"demo"</span>
version = <span class="hljs-string">"0.1.0"</span>
description = <span class="hljs-string">""</span>
authors = [<span class="hljs-string">"Your Name &lt;you@example.com&gt;"</span>]
readme = <span class="hljs-string">"README.md"</span>

[tool.poetry.dependencies]
python = <span class="hljs-string">"^3.12"</span>


[build-system]
requires = [<span class="hljs-string">"poetry-core"</span>]
build-backend = <span class="hljs-string">"poetry.core.masonry.api"</span>
</code></pre>
<p>The <a target="_blank" href="https://python-poetry.org/docs/pyproject/">pyproject.toml file</a> is used to manage project and dependencies:</p>
<ul>
<li><p><code>poetry env</code> manage virtualenvs.</p>
</li>
<li><p><code>poetry add &lt;package&gt;</code> installs packages</p>
</li>
<li><p><code>poetry update</code> will update dependencies</p>
</li>
<li><p>You can organise packages in separate <a target="_blank" href="https://python-poetry.org/docs/managing-dependencies/#dependency-groups">dependency groups</a> like <code>dev</code> , <code>docs</code>, <code>test</code>. For example, IPython, ruff, rich, or black can be added to a <code>dev</code> group. While <code>pytest</code> can be in a <code>test</code> group. This helps you install only the required packages when creating containers or deploying code.</p>
</li>
</ul>
<p>Read the docs, and explore all the features.</p>
<p>Note that both pyenv and poetry can be installed at system level while other things are at project level.</p>
<h1 id="heading-code-quality">Code Quality</h1>
<p>Lastly, if you code and care about code quality, use the following:</p>
<ul>
<li><p><a target="_blank" href="https://docs.astral.sh/ruff/">Ruff</a> is an extremely fast code linter and formatter, or <a target="_blank" href="https://black.readthedocs.io/en/stable/">Black</a></p>
</li>
<li><p>Open source code analyser <a target="_blank" href="https://www.sonarsource.com/knowledge/languages/python/">SonarQube Python</a> for large projects</p>
</li>
</ul>
<h1 id="heading-ps-get-work-done">P.S. Get work done</h1>
<p>Remember that the time spent in figuring out the best Linux distro, programming language (or paradigm), and package manager may not count as work.</p>
<p><img src="https://imgs.xkcd.com/comics/compiling.png" alt class="image--center mx-auto" /></p>
<p>Python has no compile step but you may continue to look for the best editor, typeface, and theme while the code is running.</p>
]]></content:encoded></item><item><title><![CDATA[Compare Forex Rates using Python]]></title><description><![CDATA[Currency Exchange Rates
When we transfer foreign currency, we look for the best exchange rates. Now, this rate changes daily and varies across banks. It is hard to predict the currency exchange rate but you can select the bank based on that day's rat...]]></description><link>https://code.manas.me/compare-forex-rates-using-python</link><guid isPermaLink="true">https://code.manas.me/compare-forex-rates-using-python</guid><category><![CDATA[Python]]></category><category><![CDATA[pandas]]></category><category><![CDATA[duckDB]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Sat, 02 Dec 2023 13:03:22 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/SAYzxuS1O3M/upload/5acbc9d8230b52acd3972265c9954e3b.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-currency-exchange-rates">Currency Exchange Rates</h1>
<p>When we transfer foreign currency, we look for the best exchange rates. Now, this rate changes daily and varies across banks. It is hard to predict the currency exchange rate but you can select the bank based on that day's rate.</p>
<h2 id="heading-comparing-daily-rates">Comparing daily rates</h2>
<p>While, some services provide clear information about the rate and associated conversion fees. Indian banks like SBI and HDFC publish their daily forex rates in a PDF which makes it cumbersome to compare. As of today, Google will only show you a range of the rates.</p>
<p>This is an experiment to extract data from these PDFs and compare rates. To keep things simple, we will use only 2 banks. Also, we will assume the currency is USD. That means we will only look for <a target="_blank" href="https://wise.com/au/blog/telegraphic-transfer-buying-rates">TT Buy</a> rates.</p>
<h2 id="heading-environment-setup">Environment Setup</h2>
<blockquote>
<p>This section assumes you have the recent Python 3.11 version installed and you are working in a virtualenv.</p>
</blockquote>
<p>From your preferred package manager, install the following. I am using apt (Ubuntu) and pip (Python) here:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># System wide install</span>
apt install ghostscript python3-tk

<span class="hljs-comment"># Python packages</span>
pip install camelot-py
pip install requests
pip install ghostscript
</code></pre>
<h2 id="heading-check-package-installation">Check Package Installation</h2>
<p>Let us import all these package to ensure they work.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Check Ghostscript is installed</span>
<span class="hljs-keyword">from</span> ctypes.util <span class="hljs-keyword">import</span> find_library
find_library(<span class="hljs-string">"gs"</span>)

<span class="hljs-keyword">import</span> camelot
<span class="hljs-keyword">import</span> requests
</code></pre>
<h2 id="heading-download-data">Download Data</h2>
<p>You can download the PDFs (search for bank forex rates) from the browser or copy the PDF url from the search results and then, use the code below. This may be cumbersome as some banks have a fixed URL while others change it daily but this this is the only manual step.</p>
<pre><code class="lang-python">urls = {
    <span class="hljs-string">"sbi"</span>: <span class="hljs-string">"&lt;insert PDF link here&gt;"</span>,
    <span class="hljs-string">"hdfc"</span>:<span class="hljs-string">"&lt;insert PDF link here&gt;"</span>,
    <span class="hljs-comment"># add other banks here</span>
    }

<span class="hljs-comment"># Download files as bank.pdf</span>
<span class="hljs-keyword">for</span> bank, file_url <span class="hljs-keyword">in</span> urls.items():
    file_data = requests.get(file_url).content
    <span class="hljs-keyword">with</span> open(<span class="hljs-string">f"<span class="hljs-subst">{bank}</span>.pdf"</span>, <span class="hljs-string">"wb"</span>) <span class="hljs-keyword">as</span> file:
        file.write(file_data)
</code></pre>
<p>Now, we can extract the data we require. Camelot supports many formats. We will use pandas DataFrame throughout this experiment.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Extracting a table from PDF is that easy</span>
<span class="hljs-comment"># Luckily, these PDFs are tabular</span>
tables = camelot.read_pdf(<span class="hljs-string">f'sbi.pdf'</span>)

<span class="hljs-comment"># There are many options to export this PDF</span>
<span class="hljs-comment"># We will use the Pandas DataFrame  </span>
df = tables[<span class="hljs-number">0</span>].df
</code></pre>
<p>Pandas but you can export to any other format. Pandas <a target="_blank" href="https://pandas.pydata.org/docs/user_guide/10min.html#selection">documentation</a> is a great start to learn about the various selection methods.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Now, we need to find the location of USD TT Buy</span>
<span class="hljs-comment"># Here's what I got from a bit of experimenting</span>
<span class="hljs-comment"># Skipping the verbose output of the entire PDF</span>
print(df.iloc[[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">0</span>, <span class="hljs-number">2</span>]]) 
                      <span class="hljs-number">0</span>       <span class="hljs-number">2</span>
<span class="hljs-number">1</span>              CURRENCY  TT BUY
<span class="hljs-number">2</span>  UNITED STATES DOLLAR   <span class="hljs-number">82.57</span>
</code></pre>
<p>Let us store these locations in a <code>dict</code> and extract the values. Ideally, we should be able to use row labels instead of integers.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Bank -&gt; Location</span>
banks = {
    <span class="hljs-string">"sbi"</span>: [[<span class="hljs-number">1</span>, <span class="hljs-number">2</span>], [<span class="hljs-number">0</span>, <span class="hljs-number">2</span>]],
    <span class="hljs-string">"hdfc"</span>: [[<span class="hljs-number">1</span>, <span class="hljs-number">21</span>], [<span class="hljs-number">0</span>, <span class="hljs-number">6</span>]]
}

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_rate</span>(<span class="hljs-params">bank: str</span>):</span>
    <span class="hljs-comment"># Assumes files are placed in a same dir </span>
    tables = camelot.read_pdf(<span class="hljs-string">f'./<span class="hljs-subst">{bank}</span>.pdf'</span>)
    df = tables[<span class="hljs-number">0</span>].df
    location = banks[bank]
    <span class="hljs-comment"># Doesn't work with Python 3.10 or lower</span>
    <span class="hljs-comment"># Check Colab Notes</span>
    usd_tt_buy = df.iloc[*location]
    <span class="hljs-keyword">return</span> usd_tt_buy

<span class="hljs-comment"># Display the rates</span>
<span class="hljs-keyword">for</span> bank <span class="hljs-keyword">in</span> banks:
    rate = get_rate(bank)
    <span class="hljs-comment"># Get a value from df</span>
    usd = float(rate.iloc[<span class="hljs-number">1</span>, <span class="hljs-number">1</span>])
    print(<span class="hljs-string">f"<span class="hljs-subst">{bank.upper()}</span> USD: <span class="hljs-subst">{usd}</span>"</span>)
</code></pre>
<p>That's it. Now this will display the rates for each bank.</p>
<h2 id="heading-store-data-in-a-database">Store data in a database</h2>
<p>These rates are refreshed daily so, we should store them somewhere to analyse later. A good start would be a simple <code>sqlite</code> db. However, as we have DataFrame, we will use <code>duckdb</code> which provides options to ingress and export data in several formats. Let us see what it can do:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Ensure duckdb is installed</span>
<span class="hljs-keyword">import</span> duckdb 

<span class="hljs-comment"># Create a table in the database</span>
<span class="hljs-keyword">with</span> duckdb.connect(<span class="hljs-string">"rates.db"</span>) <span class="hljs-keyword">as</span> conn:
    conn.execute(<span class="hljs-string">'''CREATE TABLE forex_rates_table (
        date DATE,
        bank VARCHAR,
        currency varchar,
        tt_buy DOUBLE)'''</span>)
</code></pre>
<p>As we are limited to 2 banks and 1 currency, we could hard-code them. This makes the examples much readable. Now, the only unknown is the rate.</p>
<p>To get values from the different DataFrames we have to figure out their location. After inspecting <code>rates[0].columns</code> where <code>rates</code> is a list of <code>DataFrame</code> , I found that the value is at <code>[2][2]</code> and <code>[6][21]</code> respectively. The integer indices are not ideal and could break the code in future. For example, SBI publishes different rates based on the amount to be transferred. This code is only using the rate from Page 1.</p>
<p>Let us insert these rows in the db to complete our experiment. We keep this data in a <code>dict</code> :</p>
<pre><code class="lang-python"><span class="hljs-comment"># Insert data</span>

values = {
   <span class="hljs-string">"sbi"</span>: rates[<span class="hljs-number">0</span>][<span class="hljs-number">2</span>][<span class="hljs-number">2</span>],
   <span class="hljs-string">"hdfc"</span>: rates[<span class="hljs-number">1</span>][<span class="hljs-number">6</span>][<span class="hljs-number">21</span>]
}

<span class="hljs-keyword">with</span> duckdb.connect(<span class="hljs-string">"rates.db"</span>) <span class="hljs-keyword">as</span> conn:
    currency = <span class="hljs-string">'usd'</span>
    <span class="hljs-keyword">for</span> bank, rate <span class="hljs-keyword">in</span> values.items():
        conn.execute(<span class="hljs-string">'''INSERT INTO forex_rates_table VALUES (current_date, ?, ?, ?) '''</span>,
                    [bank, currency, rate ])
</code></pre>
<p>Querying is simple too.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Query Database Rows as DataFrame</span>
<span class="hljs-keyword">with</span> duckdb.connect(<span class="hljs-string">"rates.db"</span>) <span class="hljs-keyword">as</span> conn:
    result = conn.execute(<span class="hljs-string">'SELECT * FROM forex_rates_table'</span>).fetch_df()
</code></pre>
<p>Note that the result is converted to DataFrame. This is helpful in a Jupyter/Colab Notebook as they support viewing data.</p>
<h2 id="heading-notes-on-google-colab">Notes on Google Colab</h2>
<p>Google Colab is a great way to run code and analyse data. With all the extra batteries included, it comes with a few quirks too:</p>
<ol>
<li><p>Some version <a target="_blank" href="https://camelot-py.readthedocs.io/en/master/"><code>camelot-py</code></a> package does not work with the latest <code>pypdf</code>, so we need to explicitly install <code>PyPDF2&lt;3.0</code> using <code>!pip install 'PyPDF2&lt;3.0'</code></p>
</li>
<li><p>Python 3.10 does not allow <code>usd = df.iloc[*loc]</code>So, we need to be explicit. I tried to find a cleaner way but this works for now.</p>
</li>
<li><p>You have to install ghostscript using apt.</p>
</li>
</ol>
<p>Also, you can use the following formatter to display <code>DataFrame</code> as a table:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> google.colab <span class="hljs-keyword">import</span> data_table
data_table.enable_dataframe_formatter()
</code></pre>
<h1 id="heading-conclusion">Conclusion</h1>
<p>That completes our experiment. We have created a way to collect forex rates and store them in a database. Here we learned:</p>
<ul>
<li><p>PDF parsing with camelot-py</p>
</li>
<li><p>Using pandas DataFrame and data selection</p>
</li>
<li><p>Creating a simple DuckDB database</p>
</li>
<li><p>Experimenting with Google Colab.</p>
</li>
</ul>
<p>Next, we can deploy this code to run daily and expose it through an API. You can try a working example in this <a target="_blank" href="https://colab.research.google.com/drive/14EF2nGlshVN_PxxSav3wcUC5vFx4Yztr?usp=sharing">Colab Notebook</a> or see output at <a target="_blank" href="https://github.com/rainzoo/notebook/blob/main/forex.ipynb">GitHub</a></p>
]]></content:encoded></item><item><title><![CDATA[Chasing simple ideas: Everything is an object]]></title><description><![CDATA[Simple but not easy: Abstraction
Object-oriented programming cannot be explained without abstraction. Beyond that also It is one of the key principles when building software. While it is important to understand and utilize abstraction, it can get in ...]]></description><link>https://code.manas.me/chasing-simple-ideas-everything-is-an-object</link><guid isPermaLink="true">https://code.manas.me/chasing-simple-ideas-everything-is-an-object</guid><category><![CDATA[Python]]></category><category><![CDATA[abstraction]]></category><category><![CDATA[programming languages]]></category><category><![CDATA[python beginner]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Fri, 07 Jul 2023 14:57:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/mdADGzyXCVE/upload/56e734cbdd8a2d619a7c439c64aa1747.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-simple-but-not-easy-abstraction">Simple but not easy: Abstraction</h1>
<p>Object-oriented programming cannot be explained without abstraction. Beyond that also It is one of the key principles when building software. While it is important to understand and utilize abstraction, it can get in the way of understanding internals. After all, abstractions are put in place to hide details.</p>
<p>In Python, objects are the abstraction for data. To understand them better, we will have to break through this abstraction layer. Observing the behaviour of Python objects is a good start but we need to dive deeper.</p>
<p>Commonly, these concepts are explained by creating classes for cars, animals, or shapes. Here, we won't do that. Instead, we will use the built-in Python types like <code>int</code> and <code>list</code> only.</p>
<h1 id="heading-code">Code</h1>
<h2 id="heading-cpython-implementation">CPython implementation</h2>
<p>We start by taking a look at how CPython implements objects:</p>
<p><code>typedef struct _object PyObject;</code></p>
<p>Since AI gets to do the boring work these days, let us ask <a target="_blank" href="https://sourcegraph.com/cody">Cody</a> to explain <code>_object</code>:</p>
<blockquote>
<p>The <code>_object</code> struct defines the basic structure of all Python objects. It contains:</p>
<ul>
<li><p>_PyObject_HEAD_EXTRA: Extra space for additional fields in debug builds.</p>
</li>
<li><p>ob_refcnt: The reference count field. This is used to track when the object can be deallocated.</p>
</li>
<li><p>ob_type: A pointer to the object's type object. This defines the object's type and its methods.</p>
</li>
</ul>
<p>So in summary, <code>_object</code> defines the basic fields that are common to all Python objects - their reference count and type information.</p>
<p>More specific object types (like lists, dicts, etc.) build upon this basic _object struct by extending it with additional fields. But they always keep these core fields to maintain memory management and type information.</p>
</blockquote>
<h2 id="heading-adventure-with-objects">Adventure with objects</h2>
<p>Now, that you have seen the basic structure, let us take help from <a target="_blank" href="https://rich.readthedocs.io/en/stable/introduction.html"><code>rich</code></a> to see this in action. You can install and import rich as below:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Install rich with your favourite package manager</span>
<span class="hljs-comment"># Replace the default print and inspect</span>
<span class="hljs-keyword">from</span> rich <span class="hljs-keyword">import</span> <span class="hljs-keyword">print</span>
<span class="hljs-keyword">from</span> rich <span class="hljs-keyword">import</span> inspect
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1688733402885/7153c951-d737-4ad8-b9fd-181aaef33786.png" alt class="image--center mx-auto" /></p>
<p>Notice the 23 attributes that are not shown, use <code>inspect(object, all=True</code> to see them all. This was still abstract but you get the idea. Now, let us work with actual objects.</p>
<h3 id="heading-python-object-has-an-identity-a-type-and-a-value">Python object has an <strong>identity</strong>, a <strong>type</strong> and a <strong>value</strong></h3>
<p>We can inspect the id, class and value of objects. Here's an example of <code>int</code>:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1688734253030/0881f3e7-daed-4988-b6ee-7a53128f0acc.png" alt class="image--center mx-auto" /></p>
<p>Objects have many interesting properties, most importantly:</p>
<ul>
<li><div data-node-type="callout">
  <div data-node-type="callout-emoji">💡</div>
  <div data-node-type="callout-text">An object’s <strong>identity</strong> never changes once it has been created</div>
  </div>


</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1688735217076/232e9005-d3f6-4e60-b89c-8bbe334d74b5.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1688735322530/6198afe9-1d5a-4eff-8619-740b23f5106c.png" alt class="image--center mx-auto" /></p>
<p>Question for beginners: Which one is the object here, <code>x</code> or <code>1</code>? The answer lies in <code>id(x)</code> which is the memory address.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Generally, an object's <strong>type</strong> is unchangeable*</div>
</div>

<ul>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1688736538694/9a408a3c-d0fb-423c-a7b9-2ca6398e40c0.png" alt class="image--center mx-auto" /></p>
<p>  * There are ways to do this assuming you know what you are doing. However, it isn't easy when there are so many ways to <a target="_blank" href="https://www.python.org/doc/humor/#shooting-yourself-in-the-foot">shoot yourself in the foot</a>.</p>
</li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">The <strong>value</strong> of <em>some</em> objects can change</div>
</div>

<ul>
<li><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1688737622039/86b9cb83-468d-4295-9168-7f2244406220.png" alt class="image--center mx-auto" /></p>
<p>  Notice that <code>y</code> is a reference to a <code>list</code> (a mutable <code>type</code>). Objects whose value cannot be changed are called <strong>immutable</strong>. Understanding this distinction can save you from a lot of trouble. No wonder, mutable arguments made it to the top of common <a target="_blank" href="https://docs.python-guide.org/writing/gotchas/">gotchas</a>.</p>
</li>
<li><p>Python wouldn't be fun if we couldn't change values. Without mutability, this would be an adventure in functional programming with Lisp or Haskell.</p>
</li>
</ul>
<p>That's a wrap on Python objects. Now you know how to inspect id, type and values. All the code is here as a notebook:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/rainzoo/notebook/blob/main/Python_Objects.ipynb">https://github.com/rainzoo/notebook/blob/main/Python_Objects.ipynb</a></div>
]]></content:encoded></item><item><title><![CDATA[Build a platform for internal tools with Windmill]]></title><description><![CDATA[Goldilocks effect
Over time, we all end up with a collection of scripts tucked away in a scripts directory or git repo. Often these are written for a specific purpose and do not belong to a single category. In my case, I have Python scripts that conf...]]></description><link>https://code.manas.me/build-a-platform-for-internal-tools-with-windmill</link><guid isPermaLink="true">https://code.manas.me/build-a-platform-for-internal-tools-with-windmill</guid><category><![CDATA[Low Code]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Python]]></category><category><![CDATA[platforms]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Tue, 06 Jun 2023 14:11:38 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/qz6NnG0Bp3Y/upload/683d5ff108c7bf53465a7f8371492adc.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-goldilocks-effect">Goldilocks effect</h1>
<p>Over time, we all end up with a collection of scripts tucked away in a <code>scripts</code> directory or git repo. Often these are written for a specific purpose and do not belong to a single category. In my case, I have Python scripts that configure infra, check for pre-requisite in clusters, generate data, query internal APIs and more. My teammates have their collection and other teams must have theirs too.</p>
<p>Scripts can become complex to run with many args while remaining trivial to be turned into full-fledged apps.</p>
<p>Fundamentally, they take<code>input</code> as args, perform actions and return an <code>output.</code>This is where <code>Windmill</code> shines. It generates UI which can be deployed as app using only the code. If you are concerned about running code on the cloud app, you can self-host on a Kubernetes cluster.</p>
<p>Here's my experience deploying Windmill on Kubernetes.</p>
<h1 id="heading-windmill">Windmill</h1>
<p><a target="_blank" href="https://docs.windmill.dev/docs/advanced/self_host">Windmill docs</a> are pretty good and straightforward. <a target="_blank" href="https://github.com/windmill-labs/windmill-helm-charts">Helm chart</a> makes it easy to deploy. You can start with minikube and then try on Kubernetes.</p>
<h2 id="heading-preparation">Preparation</h2>
<h3 id="heading-configure-external-database">Configure External database</h3>
<p>It is better to have a hosted database instance ready in advance. You can disable the default <code>postgresql</code> in values.yaml, <code>minio</code> is also optional. However, ensure the database is configured correctly before proceeding. Otherwise, app pods will end up in a <code>crashloop</code> (as the docs mention). The database credentials are not secret but try to use <code>sslmode=require</code>.</p>
<h3 id="heading-configure-ingress">Configure Ingress</h3>
<p>The default ingress works well on minikube but you need to configure Ingress as per your k8s. Remember to set<code>baseDomain</code> and <code>baseProtocol</code> (http to https) accordingly.</p>
<h3 id="heading-workaround-for-private-registry">Workaround for private registry</h3>
<p>If your k8s cannot access <code>ghcr.io/windmill-labs</code> registry then you can copy the images to a private registry. Luckily, they use only 2 images (as of version 1.109)</p>
<pre><code class="lang-bash"><span class="hljs-comment"># &lt;tag&gt; should be same as release.</span>
docker pull ghcr.io/windmill-labs/windmill:&lt;tag&gt;
docker pull ghcr.io/windmill-labs/windmill-lsp:&lt;tag&gt;
<span class="hljs-comment"># Now tag and push them to the private registry</span>
</code></pre>
<p>This is not ideal but a workaround if you are out of options. Next, clone the helm chart and replace these <code>image:</code> values. You can install this chart:</p>
<pre><code class="lang-bash">❯ <span class="hljs-built_in">cd</span> windmill-helm-charts
❯ helm install mywindmill ./charts/windmill
</code></pre>
<p>You can also use a private <code>pipIndexUrl</code> This would speed up Python code execution, as the script installs required packages.</p>
<h2 id="heading-deployment">Deployment</h2>
<p>Deployment is easy as the docs say:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># add the Windmill helm repo</span>
helm repo add windmill https://windmill-labs.github.io/windmill-helm-charts/
<span class="hljs-comment"># install chart with default values</span>
helm install windmill-chart windmill/windmill  \
      --namespace=windmill \
      --create-namespace
</code></pre>
<p>Navigate to URL and login with <code>admin@windmill.dev</code> / <code>changeme</code></p>
<h1 id="heading-generate-ui-from-code">Generate UI from code</h1>
<p>Start with a sample script which has a main function like this:</p>
<pre><code class="lang-bash">def main(
    no_default: str,
    <span class="hljs-comment">#db: postgresql,</span>
    name=<span class="hljs-string">"Nicolas Bourbaki"</span>,
    age=42,
    obj: dict = {<span class="hljs-string">"even"</span>: <span class="hljs-string">"dicts"</span>},
    l: list = [<span class="hljs-string">"or"</span>, <span class="hljs-string">"lists!"</span>],
    file_: bytes = bytes(0),
):
</code></pre>
<p>Arguments to <code>main</code> are used to generate UI elements based on the type. On execution, imported packages are installed. All <code>print</code> or log statements are displayed in a log section and <code>return</code> values are shown in output.</p>
<p>Each script gets a unique path and you can see execution history, re-use previous inputs, and even schedule runs.</p>
<p>The Web IDE provides a decent editing experience. For Python, support for assistants (Pyright, Black, Ruff) helps keep the code clean.</p>
<h3 id="heading-advanced-features">Advanced Features</h3>
<p>Auto-generated UI is one of the features of Windmill. It can create complex Workflow and comes with a decent drag-and-drop App builder. However, the code is the basic building block.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>Windmill is an awesome open-source option to turn code into apps with reasonable deployment and configuration options. However, I could not find an easy way to enable authentication without using <code>oauth</code> integrations (which may be plenty for most people). Many <a target="_blank" href="https://docs.windmill.dev/docs/integrations/integrations_on_windmill">integrations</a> might work better out of the box in the cloud app.</p>
]]></content:encoded></item><item><title><![CDATA[Write Once Run Anywhere]]></title><description><![CDATA[The Promise
The promise of writing code that will run on multiple platforms is not new. There have been many runtimes, languages, frameworks and platforms built to solve this problem. The abstraction level varies. Here, we look above the hypervisor a...]]></description><link>https://code.manas.me/write-once-run-anywhere</link><guid isPermaLink="true">https://code.manas.me/write-once-run-anywhere</guid><category><![CDATA[wasm]]></category><category><![CDATA[Python]]></category><category><![CDATA[runtime]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Fri, 12 May 2023 14:39:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/osSryggkso4/upload/ce2e23dea203f4fa5b416ad1e9216a77.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-the-promise">The Promise</h1>
<p>The promise of writing code that will run on multiple platforms is not new. There have been many runtimes, languages, frameworks and platforms built to solve this problem. The abstraction level varies. Here, we look above the hypervisor and OS layers.</p>
<p>Web browsers were built to render HTML, which is a standard. Yet, different implementations caused much pain and anguish to developers. JVM (Java Virtual Machine) runs apps written in Java (and other languages), on several devices. Though it is still widely used, it could not conquer the web.</p>
<p>On the web, there was a time when games and other rich content were written in Flash. It was a runtime which delivered the same experience on supported browsers. Alas, it was a source of several security bugs and was eventually discontinued.</p>
<p>In mobile development, there are many cross-platform frameworks. The idea remains the same: code that runs on multiple OS. Today, many frameworks target web and native devices both. There are many such examples.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1686061201139/80885bc0-d803-438e-b0dd-e9113b0d3743.png" alt class="image--center mx-auto" /></p>
<p>Most recently, <a target="_blank" href="https://developer.mozilla.org/en-US/docs/WebAssembly">Web Assembl</a>y (Wasm) provides a way to run code written in multiple languages on the web at near-native speed, with client apps running on the web that previously couldn't have done so.</p>
<h1 id="heading-web-assembly">Web Assembly</h1>
<p><a target="_blank" href="https://webassembly.org/">Wasm</a> is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications. Apart from portability, it has other goals like efficiency, safety, high performance, etc.</p>
<p>Let us take a look at portability. It can be run on Web and Non-web environments. We will not be compiling to wasm or dealing with any wasm code. Instead, we will use it to run a <strong>complete development environment</strong> in the browser.</p>
<h2 id="heading-starters-web">Starters: Web</h2>
<p><a target="_blank" href="https://pyodide.org/">Pyodide</a> runs Python in the browser powered by Wasm. It can be used as a kernel to run Jupyter notebooks using Jupyterlite. Here's how you can run it locally</p>
<p>Assuming you have Python and a virtualenv, create a project dir, say <code>python-wasm</code>. Create a <code>requirements.txt</code> file:</p>
<pre><code class="lang-plaintext">jupyterlite-core
jupyterlite-pyodide-kernel
ipywidgets
</code></pre>
<p>Install the requirements and run</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Install requirements</span>
python -m pip install -r requirements.txt
<span class="hljs-comment"># Create a Jupyterlite site</span>
jupyter lite build --output-dir dist
<span class="hljs-comment"># Run</span>
jupyter lite serve
</code></pre>
<p>Open the link displayed in the output:</p>
<pre><code class="lang-bash"> Serving JupyterLite Debug Server from:
            /Users/python-wasm/workers/public/_output
        on:
            http://127.0.0.1:8000/index.html
</code></pre>
<p>And you can run (almost) any Python code! You may be able to interact with the notebook as long as the content is cached in the <a target="_blank" href="https://jupyterlite.readthedocs.io/en/latest/howto/configure/storage.html">browser storage</a> but changes do not persist across sessions.</p>
<p>Another interesting project is <a target="_blank" href="https://github.com/python/cpython/blob/main/Tools/wasm/README.md">CPython Wasm</a>. <mark>We can run </mark> <a target="_blank" href="https://sqlite.org/wasm/doc/trunk/index.md"><mark>sqlite</mark></a><mark>, </mark> <a target="_blank" href="https://supabase.com/blog/postgres-wasm"><mark>Postgres</mark></a><mark>, and even </mark> <a target="_blank" href="https://wordpress.wasmlabs.dev/"><mark>Wordpress</mark></a><mark>!</mark></p>
<h2 id="heading-entree-non-web">Entrée: Non-Web</h2>
<p>Wasm can be run beyond the web. It can be run from <a target="_blank" href="https://docs.docker.com/desktop/wasm/">docker containers</a>, directly using the various runtimes, at the edge and more.</p>
<p>Let us try an interesting project <a target="_blank" href="https://workers.wasmlabs.dev">Wasm Workers Server</a> which can be installed easily:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Install runtime</span>
curl -fsSL https://workers.wasmlabs.dev/install | bash
</code></pre>
<p>Now, we can run workers in multiple languages using this runtime. But wait ...</p>
<h2 id="heading-dessert-best-of-both">Dessert: Best of both</h2>
<p>We already have a Jupyterlite site (which runs on wasm) and now, a wasm runtime. <strong>Why not integrate these two?</strong></p>
<p>Wasm Workers Server (<code>wws</code>) is capable of serving static content. So, let us run the Jupyterlite using <code>wws</code></p>
<p>In the <code>python-wasm</code> dir, create another dir <code>public</code> and move the <code>dist</code> dir there.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Project structure will look like</span>
 └── python-wasm
    └── public
        └── dist

<span class="hljs-comment"># Install the wws Python module</span>
<span class="hljs-built_in">cd</span> python-wasm
wws runtimes install python latest

<span class="hljs-comment"># Run the server</span>
wws
❯ wws
⚙️  Loading routes from: .                                                                                                                                             ─╯
🗺  Detected routes:
    - http://127.0.0.1:8080/
      =&gt; ./index.py
🚀 Start serving requests at http://127.0.0.1:8080
</code></pre>
<p>Now your Jupyterlite site is accessible at <a target="_blank" href="http://127.0.0.1:8080/dist/">http://127.0.0.1:8080/dist/</a></p>
<p>Now, this is entirely powered by Web Assembly!</p>
<p>Here's the code on Replit:</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://replit.com/@smanas/JupyterLite?v=1">https://replit.com/@smanas/JupyterLite?v=1</a></div>
]]></content:encoded></item><item><title><![CDATA[Learn Python asyncio and socket by checking ports]]></title><description><![CDATA[Introduction
There are several network scanners and tools that let you find out open ports. This is a demonstration of using socket, ipaddress and asyncio to check ports concurrently.
Basic Check
A basic check can be written using socket module:
with...]]></description><link>https://code.manas.me/learn-python-asyncio-and-socket-by-checking-ports</link><guid isPermaLink="true">https://code.manas.me/learn-python-asyncio-and-socket-by-checking-ports</guid><category><![CDATA[Python]]></category><category><![CDATA[socket]]></category><category><![CDATA[asynchronous]]></category><dc:creator><![CDATA[Manas Singh]]></dc:creator><pubDate>Tue, 13 Dec 2022 11:22:49 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/VieM9BdZKFo/upload/775767035117c5bb0ff623851632e693.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>There are several network scanners and tools that let you find out open ports. This is a demonstration of using <code>socket</code>, <code>ipaddress</code> and <code>asyncio</code> to check ports concurrently.</p>
<h1 id="heading-basic-check">Basic Check</h1>
<p>A basic check can be written using <code>socket</code> module:</p>
<pre><code class="lang-python"><span class="hljs-keyword">with</span> socket.socket(family, sock_type) <span class="hljs-keyword">as</span> s:
        result = s.connect_ex(address)
</code></pre>
<p>For our example, we will only support the following:</p>
<ul>
<li><p><code>family</code>: <code>AF_INET</code> for IPv4, <code>AF_INET6</code> for IPv6</p>
</li>
<li><p><code>sock_type</code> : <code>SOCK_STREAM</code> for TCP, <code>SOCK_DGRAM</code> for UDP</p>
</li>
</ul>
<p>Notice, we use <a target="_blank" href="https://docs.python.org/3/library/socket.html#socket.socket.connect_ex">connect_ex</a> as it returns an error indication instead of raising exceptions. Now, let us create a function <code>check_port</code> that accepts a server IP address, port and port type and returns the result <code>0</code> if success</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">check_port</span>(<span class="hljs-params">server: str, port: int, port_type: str</span>) -&gt; int:</span>
    <span class="hljs-string">"""Check whether a port is open or not
    :param server: IP Address 
    :param port: Port number
    :param port_type: Port Type
    :return: 0 if successful, err code no.
    """</span>
    <span class="hljs-keyword">if</span> port_type == <span class="hljs-string">"TCP"</span>:
        sock_type = socket.SOCK_STREAM
    <span class="hljs-keyword">elif</span> port_type == <span class="hljs-string">"UDP"</span>:
        sock_type = socket.SOCK_DGRAM
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">"Port type should be TCP or UDP"</span>)
    <span class="hljs-comment"># validate the address</span>
    ip_address = ipaddress.ip_address(server)
    <span class="hljs-keyword">if</span> ip_address.version == <span class="hljs-number">4</span>:
        family = socket.AF_INET
    <span class="hljs-keyword">else</span>:
        family = socket.AF_INET6
    location = (server, port)
    <span class="hljs-comment"># It helps to track the time </span>
    start = time.perf_counter_ns()
    <span class="hljs-keyword">with</span> socket.socket(family, sock_type) <span class="hljs-keyword">as</span> a_socket:
        <span class="hljs-comment"># timeouts saves us from a forever blocking loop</span>
        a_socket.settimeout(TIMEOUT)
        result = a_socket.connect_ex(location)
        a_socket.close()
    end = time.perf_counter_ns() - start
    <span class="hljs-keyword">return</span> result
</code></pre>
<h1 id="heading-run-concurrently-with-asyncio">Run concurrently with asyncio</h1>
<p>We should be able to run this against multiple addresses and ports. The obvious way to achieve this is to pass a <code>list</code> of ports and addresses and execute the logic in a loop. This would a serial approach. The time taken by this execution will be the sum of all the checks. This does not sound optimal.</p>
<p>The next option is to run these checks concurrently. We have already added a timeout to ensure that the time does not exceed <code>TIMEOUT</code></p>
<p>So, let us use <code>asyncio</code> to run this function concurrently.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> asyncio
<span class="hljs-keyword">import</span> ipaddress
<span class="hljs-keyword">import</span> socket
<span class="hljs-keyword">import</span> time

<span class="hljs-keyword">from</span> rich.progress <span class="hljs-keyword">import</span> track

SERVERS = []
<span class="hljs-comment"># List of ports to check</span>
TCP_PORTS = [<span class="hljs-number">53</span>, <span class="hljs-number">389</span>, <span class="hljs-number">445</span>]
UDP_PORTS = [<span class="hljs-number">3268</span>, <span class="hljs-number">636</span>, <span class="hljs-number">185</span>]
TIMEOUT = <span class="hljs-number">2</span>

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    <span class="hljs-keyword">for</span> server <span class="hljs-keyword">in</span> track(SERVERS, description=<span class="hljs-string">"Checking..."</span>):
        rows = <span class="hljs-keyword">await</span> asyncio.gather(
            *[asyncio.to_thread(check_port, server, port, <span class="hljs-string">"UDP"</span>) <span class="hljs-keyword">for</span> port <span class="hljs-keyword">in</span> UDP_PORTS],
            *[asyncio.to_thread(check_port, server, port, <span class="hljs-string">"TCP"</span>) <span class="hljs-keyword">for</span> port <span class="hljs-keyword">in</span> TCP_PORTS],
        )

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    s = time.perf_counter()
    asyncio.run(main())
    elapsed = time.perf_counter() - s
    print(<span class="hljs-string">f"Executed in <span class="hljs-subst">{elapsed:<span class="hljs-number">0.2</span>f}</span> seconds."</span>)
</code></pre>
<p>Now, let us breakdown the code to understand it better:</p>
<p><code>asyncio.to_thread</code> runs the function in a separate thread. So, we run a separate thread for each port. <code>asyncio.gather</code> runs these threads, which are awaitable objects, concurrently. To illustrate that multiple threads can be passed to <code>gather</code>, we have separated UDP and TCP.</p>
<p>We wrap all this logic in <code>async def main()</code> which itself is run by <code>asyncio.run(main())</code></p>
<h1 id="heading-rich-table-output">Rich Table Output</h1>
<p>Next, we print the out in a formatted table using the amazing <code>rich</code> package</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> asyncio
<span class="hljs-keyword">import</span> ipaddress
<span class="hljs-keyword">import</span> socket
<span class="hljs-keyword">import</span> time

<span class="hljs-keyword">from</span> rich.console <span class="hljs-keyword">import</span> Console
<span class="hljs-keyword">from</span> rich.progress <span class="hljs-keyword">import</span> track
<span class="hljs-keyword">from</span> rich.table <span class="hljs-keyword">import</span> Table

SERVERS = []

TCP_PORTS = [<span class="hljs-number">53</span>, <span class="hljs-number">389</span>, <span class="hljs-number">445</span>, <span class="hljs-number">3268</span>, <span class="hljs-number">636</span>]
UDP_PORTS = [<span class="hljs-number">3268</span>, <span class="hljs-number">636</span>, <span class="hljs-number">185</span>]
TIMEOUT = <span class="hljs-number">2</span>

<span class="hljs-comment"># Rich Table</span>
console = Console()
table = Table(title=<span class="hljs-string">"Results"</span>)
table.add_column(<span class="hljs-string">"Server"</span>, justify=<span class="hljs-string">"left"</span>)
table.add_column(<span class="hljs-string">"Type"</span>, justify=<span class="hljs-string">"left"</span>)
table.add_column(<span class="hljs-string">"Port"</span>, justify=<span class="hljs-string">"left"</span>)
table.add_column(<span class="hljs-string">"Status"</span>, justify=<span class="hljs-string">"left"</span>)
table.add_column(<span class="hljs-string">"Time (ns)"</span>, justify=<span class="hljs-string">"right"</span>)


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">check_port</span>(<span class="hljs-params">server: str, port: int, port_type: str</span>) -&gt; dict:</span>
    <span class="hljs-string">"""Check whether a port is open or not
    :param server: IP address
    :param port: Port number
    :param port_type: Port Type
    :return: rich table row as dict
    """</span>
    <span class="hljs-keyword">if</span> port_type == <span class="hljs-string">"TCP"</span>:
        sock_type = socket.SOCK_STREAM
    <span class="hljs-keyword">elif</span> port_type == <span class="hljs-string">"UDP"</span>:
        sock_type = socket.SOCK_DGRAM
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">"Port type should be TCP or UDP"</span>)
    ip_address = ipaddress.ip_address(server)
    <span class="hljs-keyword">if</span> ip_address.version == <span class="hljs-number">4</span>:
        family = socket.AF_INET
    <span class="hljs-keyword">else</span>:
        family = socket.AF_INET6

    location = (server, port)
    start = time.perf_counter_ns()
    <span class="hljs-keyword">with</span> socket.socket(family, sock_type) <span class="hljs-keyword">as</span> a_socket:
        a_socket.settimeout(TIMEOUT)
        result_of_check = a_socket.connect_ex(location)
        a_socket.close()
    end = time.perf_counter_ns() - start
    <span class="hljs-comment"># Result to rich format</span>
    <span class="hljs-keyword">if</span> result_of_check == <span class="hljs-number">0</span>:
        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">"row"</span>: [server, port_type, str(port), <span class="hljs-string">"OPEN"</span>, <span class="hljs-string">f"<span class="hljs-subst">{end:<span class="hljs-number">0.2</span>f}</span>"</span>],
            <span class="hljs-string">"style"</span>: <span class="hljs-string">"green"</span>,
        }
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">"row"</span>: [server, port_type, str(port), <span class="hljs-string">"CLOSED"</span>, <span class="hljs-string">f"<span class="hljs-subst">{end:<span class="hljs-number">0.2</span>f}</span>"</span>],
            <span class="hljs-string">"style"</span>: <span class="hljs-string">"red"</span>,
        }

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    <span class="hljs-keyword">for</span> server <span class="hljs-keyword">in</span> track(SERVERS, description=<span class="hljs-string">"Checking..."</span>):
        rows = <span class="hljs-keyword">await</span> asyncio.gather(
            *[asyncio.to_thread(check_port, server, port, <span class="hljs-string">"UDP"</span>) <span class="hljs-keyword">for</span> port <span class="hljs-keyword">in</span> UDP_PORTS],
            *[asyncio.to_thread(check_port, server, port, <span class="hljs-string">"TCP"</span>) <span class="hljs-keyword">for</span> port <span class="hljs-keyword">in</span> TCP_PORTS],
        )
        [table.add_row(*r[<span class="hljs-string">"row"</span>], style=r[<span class="hljs-string">"style"</span>]) <span class="hljs-keyword">for</span> r <span class="hljs-keyword">in</span> rows]


<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    s = time.perf_counter()
    asyncio.run(main())
    console.print(table)
    elapsed = time.perf_counter() - s
    print(<span class="hljs-string">f"Executed in <span class="hljs-subst">{elapsed:<span class="hljs-number">0.2</span>f}</span> seconds."</span>)
</code></pre>
<p>Now, let us understand what's happening here.</p>
<p>The rows are appended to the <code>rich</code> table for each server. Once the <code>asyncio</code> loop is finished, we display the table using <code>console.print(table)</code></p>
<p>Here's the output of the program run against Google's public DNS IPs.</p>
<p>The progress bar is displayed by the <code>track</code> method in <code>rich.progress</code></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1670929896080/CU3aoFp03.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-fork-it-on-replit">Fork it on Replit</h1>
<p>The entire program can be run from the Replit. Feel free to fork it.</p>
<p><strong>WARNING: Please run this program responsibly, it is meant to be a demo only. Port scanning can be considered malicious activity.</strong></p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://replit.com/@smanas/portcheck?v=1">https://replit.com/@smanas/portcheck?v=1</a></div>
]]></content:encoded></item></channel></rss>