大模型应用系列(二) Huggingface的安装和使用

介绍如何从huggingface下载模型，如何使用API调用huggingface模型的在线服务，以及如何本地运行模型推理服务。

一. 使用API调用Huggingface 在线服务。

通过post向huggingface发送请求, 代码如下:

import requests
# 通过post调用huggingface的在线模型
API_URL = "https://api-inference.huggingface.co/models/uer/gpt2-distil-chinese-cluecorpussmall"
API_TOKEN = "hf_xxxxxxxxxxx" # 从官网申请API_TOKEN

headers = {"Authorization": f"Bearer {API_TOKEN}"}
# 不使用token，匿名访问
response = requests.post(API_URL, headers=headers, json = {"inputs":"你好,huggingface"})
print(response.json())

其中API_TOKEN要从官网申请

点击Access Token，然后进入申请，里面的权限全选即可。

二. 从huggingface上拉取模型

首先创建python虚拟环境，如果没有安装Anaconda，可以参考
Windows下的Anaconda详细安装教程_windows安装anaconda-CSDN博客

pip 安装huggingface,transformers

1 2	pip install huggingface_hub pip install -U transformers

从https://huggingface.co/ 上查找对应的模型，然后用如下命令下载, 比如要下载模型gpt2-chinese-cluecorpussmall到当前目录下的 ./gpt2-chinese-cluecorpussmall，则用如下命令:

1	huggingface-cli download --resume-download Qwen/Qwen2.5-0.5B-Instruct --local-dir Qwen2.5-0.5B-Instruct

下载数据集用如下命令:

1	huggingface-cli download --repo-type dataset lavita/medical-qa-shared-task-v1-toy --local-dir edical-qa-shared-task-v1-toy

注意: 从huggingface上下载需要科学上网

如果没有科学上网，可以从huggingface的国内镜像下载(笔者常用，推荐)

HF-MIRRO

下载前先设置环境变量

windows

1	$env:HF_ENDPOINT = "https://hf-mirror.com"

linux

1	export HF_ENDPOINT=https://hf-mirror.com

下载模型

1	huggingface-cli download --resume-download gpt2 --local-dir gpt2

下载数据集

1	huggingface-cli download --repo-type dataset --resume-download wikitext --local-dir wikitext

三. 本地运行模型

下载好模型后，使用transformers运行模型, 目前大模型可以简单分为两类: Bert类和GPT类，Bert类常用于词嵌入，分类，情感识别等，GPT类用于生成。

3.1 Bert类

from transformers import BertTokenizer, BertForSequenceClassification, pipeline

# 用bert做分类
model_path = "bert-base-chinese"  # 模型文件夹所在目录，可以绝对路径或者相对路径

model = BertForSequenceClassification.from_pretrained(model_path)
tokenizer = BertTokenizer.from_pretrained(model_path)

classifier = pipeline("text-classification", model = model, tokenizer = tokenizer, device = 'cpu')

result = classifier("你好, 我是一款语言模型")
print(result)

3.2 GPT 类

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
# 本地调用GPT2进行语言生成
model_name = "uer/gpt2-distil-chinese-cluecorpussmall"
model_path = "./gpt2-chinese-cluecorpussmall"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 使用GPT2创建生成文本的pipeline
generator = pipeline("text-generation", model = model, tokenizer = tokenizer, device='cpu')

# 生成文本

output = generator("讲一个猫和老鼠的故事,", 
                   max_length=50, 
                   num_return_sequences=1, # 将输出划分为num_return_sequences
                   truncation=True,  # 输入超出max_token会截断
                   temperature=0.1, # 温度参数, 越大越有创造性(随机性)
                   top_k=50, # 每次生成时只会从概率最高的top_k中选择, 再根据temperature选择
                   top_p=0.9, # 核采样, 会从生成的词汇中选择一组累计概率达到top_p的词汇中选择
                   clean_up_tokenization_spaces=True # 是否清除多余的空格
                   )
print(output)