LLM API Benchmark MCP Server Tutorial

Loouis published on 2025-06-26 included in AI IT

This article introduces how to configure and use llm-api-benchmark-mcp-server, a tool that allows LLM Agents to measure LLM API throughput performance under natural language instructions, and details the steps for setting up and conducting concurrent performance tests in Roo Code.

LLM API Performance Evaluation Tool Guide

Loouis published on 2025-02-13 included in AI

This article provides a quick start guide for Linux and Windows platforms, including command examples for downloading, configuring, and running the tool.

Re-evaluating: The True Power of Flash Attention 2

Loouis published on 2025-02-08 included in AI

This article compares the performance differences of vllm when using xformers and flash attention 2 as the backend attention mechanism.

Running Large Language Models on a VPS

Loouis published on 2024-12-03 included in AI IT

This post documents how to run LLMs on a 1C1G VPS using Ollama.

Large Language Model Inference Framework Throughput Comparison: VLLM | SGLang | LMDeploy

Loouis published on 2024-11-23 included in AI

This article compares the throughput of three large language model inference engines, VLLM, SGLang, and LMDeploy, in a short-input, long-output scenario. The unit of measurement is output tokens per second.

How to Make Your Website More Secure? How to Restrict Services Started via Docker Containers with UFW? UFW One-Click Script

Loouis published on 2024-10-30 included in IT

This article explains how to configure the UFW firewall using a one-click script to restrict network access for Docker container services, enhancing website security.