这是针对在 Windows 开发环境中运行
from datasets import load_dataset
dataset = load_dataset("json", data_files="dataset.jsonl")
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Qwen/Qwen2-7B-Instruct",
max_seq_length = 2048,
dtype = torch.float16,
load_in_4bit = True,
)
时出现的 GPU/torch 识别问题的完整排查与处理记录,包含我所执行的步骤、关键命令与输出、原因分析与后续建议。
一、问题概述
在尝试运行 python main.py 时,程序在导入 unsloth 过程中抛出错误:
NotImplementedError: Unsloth cannot find any torch accelerator? You need a GPU.
但系统的 nvidia-smi 显示机器上确实有 NVIDIA GPU(Tesla P40)。因此需排查为何 PyTorch 没能识别到 CUDA 设备。
PS C:Windowssystem32> nvidia-smi
Wed Nov 5 11:34:30 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.61 Driver Version: 551.61 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P40 TCC | 00000000:04:00.0 Off | Off |
| N/A 36C P8 11W / 250W | 8MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
二、环境信息(关键项)
- 工作目录:
c:\Users\admin\workspace\qwen-fintune - 虚拟环境:
.venv(已激活用于测试与安装) - GPU (nvidia-smi): Tesla P40
驱动(来自 nvidia-smi 输出的摘要):Driver Version: 551.61, CUDA Version: 12.4
三、排查步骤与执行记录
下面按时间线列出执行的命令与关键输出(已摘录并简要说明):
- 运行
main.py报错
(.venv) PS C:Usersadminworkspaceqwen-fintune> python .main.py
Generating train split: 2 examples [00:00, 9.21 examples/s]
Traceback (most recent call last):
File "C:Usersadminworkspaceqwen-fintunemain.py", line 5, in <module>
from unsloth import FastLanguageModel
File "C:Usersadminworkspaceqwen-fintune.venvLibsite-packagesunsloth_init.py", line 77, in <module>
import unsloth_zoo
File "C:Usersadminworkspaceqwen-fintune.venvLibsite-packagesunsloth_zoo_init.py", line 140, in <module>
from .device_type import (
File "C:Usersadminworkspaceqwen-fintune.venvLibsite-packagesunsloth_zoodevice_type.py", line 56, in <module>
DEVICE_TYPE : str = get_device_type()
^^^^^^^^^^^^^^^^^
File "C:Usersadminworkspaceqwen-fintune.venvLibsite-packagesunsloth_zoodevice_type.py", line 46, in get_device_type
raise NotImplementedError("Unsloth cannot find any torch accelerator? You need a GPU.")
NotImplementedError: Unsloth cannot find any torch accelerator? You need a GPU.
- 检查系统 GPU(nvidia-smi):
PS> nvidia-smi
Wed Nov 5 10:43:39 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.61 Driver Version: 551.61 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
|=========================================+========================+======================|
| 0 Tesla P40 TCC | 00000000:04:00.0 Off | Off |
| N/A 59C P0 55W / 250W | 20130MiB / 24576MiB | 0% Default |
+-----------------------------------------------------------------------------------------+
此时有进程(例如 ollama.exe)占用了大量显存(~20GB),我们后来也释放了该进程以避免显存不足。
- 在虚拟环境中检测 PyTorch 与 CUDA 可用性(初次):
命令:
.venvScriptspython.exe -c "import sys, torch; print('PY:', sys.executable); print('TORCH:', getattr(torch,'__version__','<no torch>')); print('CUDA available:', torch.cuda.is_available() if hasattr(torch,'cuda') else 'no cuda module'); print('torch.version.cuda:', getattr(torch.version,'cuda',None)); print('device count:', torch.cuda.device_count() if hasattr(torch,'cuda') else 0); print('names:', [torch.cuda.get_device_name(i) for i in range(torch.cuda.device_count())] if torch.cuda.device_count() else [])"
输出摘要(当时):
PY: C:Usersadminworkspaceqwen-fintune.venvScriptspython.exe
TORCH: 2.8.0+cpu
CUDA available: False
torch.version.cuda: None
device count: 0
names: []
结论:虚拟环境中安装的 PyTorch 是 CPU-only(版本后缀含 +cpu),因此 torch.cuda.is_available() 为 False。
- 尝试安装带 CUDA 支持的 PyTorch(目标:cu121)
命令(我在 venv 中执行):
.venvScriptspython.exe -m pip install --upgrade pip
.venvScriptspython.exe -m pip install --force-reinstall torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121
安装过程输出摘要与重要信息:
- pip 最终安装了
torch-2.9.0和相关包,但 pip 输出显示存在依赖冲突(xformers 0.0.32.post2 requires torch==2.8.0),pip 在解析期间回报了冲突警告。 - 更重要的是,验证安装后
torch.__version__显示为2.9.0+cpu,torch.cuda.is_available()仍为False。
说明:尽管使用了 PyTorch 的 cu121 索引,但 pip 最终安装的 wheel 仍为 CPU-only(这通常由 Python 版本、wheel 可用性或索引匹配问题引起)。
- 为更细粒度查看,创建并运行了
check-cuda.py(已加入仓库),该脚本尝试通过三种方式收集设备信息:torch、pynvml、nvidia-smi。脚本位于:check-cuda.py。
脚本运行后的 JSON 输出(已格式化,关键信息):
{
"python_executable": "C:\Users\admin\workspace\qwen-fintune\.venv\Scripts\python.exe",
"torch": {
"torch_version": "2.9.0+cpu",
"cuda_available": false,
"torch_cuda_version": null,
"device_count": 0,
"devices": []
},
"pynvml": {
"gpus": [
{
"id": 0,
"name": "Tesla P40",
"memory_total_bytes": 25769803776,
"memory_free_bytes": 25654001664,
"memory_used_bytes": 115802112,
"gpu_util_percent": 0,
"memory_util_percent": 0
}
]
},
"nvidia_smi": {
"gpus": [
{
"index": 0,
"name": "Tesla P40",
"memory_total_MiB": 24576,
"memory_free_MiB": 24465,
"memory_used_MiB": 8,
"util_gpu_percent": 0,
"util_mem_percent": 0
}
]
}
}
注意:脚本运行期间出现了一个警告(来自 torch 的 import)建议使用 nvidia-ml-py 替代 pynvml,这只是 API 的提示,并不影响设备的可见性判断。
四、根本原因
总结:
- 系统驱动与
nvidia-smi/pynvml均能识别 Tesla P40,表明底层驱动正常。 - 但虚拟环境内安装的 PyTorch 为 CPU-only(wheel 名后缀或内部标记为
+cpu),因此torch无法访问 CUDA runtime,导致unsloth抛出“需要 GPU”的错误。 - pip 在尝试安装 CUDA wheel 时出现两个问题:一是与
xformers的版本依赖冲突,二是 pip 最终安装了 CPU wheel(可能与 Python 版本或索引匹配有关)。
五、做过的操作(简洁记录)
- 手动结束占用显存的进程(
ollama.exe)以释放 VRAM(用户在会话中完成)。 - 在 venv 中尝试安装
torch torchvision torchaudio(目标 cu121),pip 显示已安装 torch 2.9.0,但仍为 CPU-only。 - 新增
check-cuda.py用于一致性检查并记录输出。
六、建议的最终解决方案与命令
下面两种常用策略中选择其一:
方案 A(推荐——保持 xformers,如果你需要它,则选择与 xformers 兼容的 CUDA torch 版本):
- 检查当前 Python 版本(必须与 wheel ABI 匹配):
.venv\Scripts\python.exe --version
- 查找与
xformers要求的torch==2.8.0相兼容的 CUDA wheel(例如torch==2.8.0+cu121),如果存在则安装:
.venv\Scripts\python.exe -m pip uninstall -y torch torchvision torchaudio
.venv\Scripts\python.exe -m pip install "torch==2.8.0+cu121" torchvision==0.23.0+cu121 torchaudio==2.8.0+cu121 --extra-index-url https://download.pytorch.org/whl/cu121
注意:版本号仅为示例。请先在 PyTorch 官方页面选择与你的 Python 版本、CUDA 运行时(cu121)匹配的精确 wheel 版本。
方案 B(更简单——不需要 xformers,先卸载它并安装最新版有 CUDA 的 torch):
.venv\Scripts\Activate.ps1
# 可选:备份 requirements, 或记录当前包
pip uninstall -y xformers torch torchvision torchaudio
pip install --upgrade pip
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
# 验证
.venv\Scripts\python.exe -c "import torch; print(torch.__version__); print('cuda available:', torch.cuda.is_available()); print('devices:', torch.cuda.device_count())"
如果安装成功,torch.cuda.is_available() 应返回 True,并且 torch.cuda.device_count() 应大于 0,随后可运行 python .main.py 验证 unsloth 是否能找到 accelerator。
执行检查结果
C:Usersadminworkspaceqwen-fintune.venvLibsite-packagestorchcuda__init__.py:61: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
{
"python_executable": "C:\Users\admin\workspace\qwen-fintune\.venv\Scripts\python.exe",
"torch": {
"torch_version": "2.5.1+cu121",
"cuda_available": true,
"torch_cuda_version": "12.1",
"device_count": 1,
"devices": [
{
"id": 0,
"name": "Tesla P40",
"total_memory_bytes": 25662586880,
"multi_processor_count": 30
}
]
},
"pynvml": {
"gpus": [
{
"id": 0,
"name": "Tesla P40",
"memory_total_bytes": 25769803776,
"memory_free_bytes": 25654001664,
"memory_used_bytes": 115802112,
"gpu_util_percent": 0,
"memory_util_percent": 0
}
}
]
},
"pynvml": {
"gpus": [
{
"id": 0,
"name": "Tesla P40",
"memory_total_bytes": 25769803776,
"memory_free_bytes": 25654001664,
"memory_used_bytes": 115802112,
"gpu_util_percent": 0,
"memory_util_percent": 0
}
},
"pynvml": {
"gpus": [
{
"id": 0,
"name": "Tesla P40",
"memory_total_bytes": 25769803776,
"memory_free_bytes": 25654001664,
"memory_used_bytes": 115802112,
"gpu_util_percent": 0,
"memory_util_percent": 0
}
{
"id": 0,
"name": "Tesla P40",
"memory_total_bytes": 25769803776,
"memory_free_bytes": 25654001664,
"memory_used_bytes": 115802112,
"gpu_util_percent": 0,
"memory_util_percent": 0
}
"name": "Tesla P40",
"memory_total_bytes": 25769803776,
"memory_free_bytes": 25654001664,
"memory_used_bytes": 115802112,
"gpu_util_percent": 0,
"memory_util_percent": 0
}
"memory_total_bytes": 25769803776,
"memory_free_bytes": 25654001664,
"memory_used_bytes": 115802112,
"gpu_util_percent": 0,
"memory_util_percent": 0
}
"memory_used_bytes": 115802112,
"gpu_util_percent": 0,
"memory_util_percent": 0
}
"memory_util_percent": 0
}
}
]
"nvidia_smi": {
{
"index": 0,
"name": "Tesla P40",
"memory_free_MiB": 24465,
"memory_used_MiB": 8,
"util_gpu_percent": 0,
"util_mem_percent": 0
}
]
}
}
七、注意事项与后续建议
- 保持 Python 版本与 PyTorch wheel 支持的 ABI 匹配(例如 Windows 上某些 torch+CUDA wheel 只在特定 Python 版本上提供)。
- 在对深度学习库做重大升级(如 torch)前建议先在虚拟环境中备份或记录依赖(pip freeze > requirements.txt)。
- 如果你的工作依赖
xformers或其它对 torch 精确版本有强依赖的库,优先根据这些依赖挑选合适的 CUDA wheel,或在团队内部决定是否可以升级/替换相关包。
八、附录:已创建/修改的文件
check-cuda.py— 新增的诊断脚本,用于在 venv 中收集torch、pynvml与nvidia-smi信息(项目根目录)。
#!/usr/bin/env python3
"""check-cuda.py
Prints GPU/CUDA information using multiple fallbacks:
- torch (if installed)
- pynvml (if installed)
- nvidia-smi subprocess query (if available on PATH)
Output is JSON printed to stdout for easy parsing.
"""
import sys
import json
import subprocess
def gather_torch_info():
info = {}
try:
import torch
info['torch_version'] = torch.__version__
info['cuda_available'] = torch.cuda.is_available()
info['torch_cuda_version'] = getattr(torch.version, 'cuda', None)
device_count = torch.cuda.device_count() if hasattr(torch, 'cuda') else 0
info['device_count'] = device_count
devices = []
for i in range(device_count):
try:
prop = torch.cuda.get_device_properties(i)
name = prop.name
total_mem = getattr(prop, 'total_memory', None)
devices.append({
'id': i,
'name': name,
'total_memory_bytes': total_mem,
'multi_processor_count': getattr(prop, 'multi_processor_count', None),
})
except Exception as e:
devices.append({'id': i, 'error': str(e)})
info['devices'] = devices
except Exception as e:
info['error'] = str(e)
return info
def gather_pynvml_info():
info = {}
try:
import pynvml
try:
pynvml.nvmlInit()
count = pynvml.nvmlDeviceGetCount()
gpus = []
for i in range(count):
handle = pynvml.nvmlDeviceGetHandleByIndex(i)
try:
name = pynvml.nvmlDeviceGetName(handle)
# name may be bytes on some systems
if isinstance(name, bytes):
name = name.decode(errors='ignore')
mem = pynvml.nvmlDeviceGetMemoryInfo(handle)
util = pynvml.nvmlDeviceGetUtilizationRates(handle)
gpus.append({
'id': i,
'name': name,
'memory_total_bytes': int(mem.total),
'memory_free_bytes': int(mem.free),
'memory_used_bytes': int(mem.used),
'gpu_util_percent': int(util.gpu),
'memory_util_percent': int(util.memory),
})
except Exception as e:
gpus.append({'id': i, 'error': str(e)})
info['gpus'] = gpus
pynvml.nvmlShutdown()
except Exception as e:
info['nvml_error'] = str(e)
except Exception as e:
info['import_error'] = str(e)
return info
def gather_nvidia_smi():
info = {}
try:
cmd = [
'nvidia-smi',
'--query-gpu=index,name,memory.total,memory.free,memory.used,utilization.gpu,utilization.memory',
'--format=csv,nounits,noheader'
]
proc = subprocess.run(cmd, capture_output=True, text=True, check=True)
rows = []
for line in proc.stdout.strip().splitlines():
parts = [p.strip() for p in line.split(',')]
if len(parts) >= 7:
try:
rows.append({
'index': int(parts[0]),
'name': parts[1],
'memory_total_MiB': int(parts[2]),
'memory_free_MiB': int(parts[3]),
'memory_used_MiB': int(parts[4]),
'util_gpu_percent': int(parts[5]),
'util_mem_percent': int(parts[6]),
})
except Exception:
rows.append({'raw': parts})
else:
rows.append({'raw': parts})
info['gpus'] = rows
except FileNotFoundError:
info['error'] = 'nvidia-smi not found on PATH'
except subprocess.CalledProcessError as e:
info['error'] = f'nvidia-smi failed: {e}'
except Exception as e:
info['error'] = str(e)
return info
def main():
result = {
'python_executable': sys.executable,
}
result['torch'] = gather_torch_info()
result['pynvml'] = gather_pynvml_info()
result['nvidia_smi'] = gather_nvidia_smi()
print(json.dumps(result, indent=2, ensure_ascii=False))
if __name__ == '__main__':
main()