The rapid advancement of large language models (LLMs) has opened new possibilities for automating complex analytical workflows in computational biology. However, the absence of standardized evaluation ...