一、介绍

1、该服务是基于SD Web UI搭建而成,采用sd-v1-5-inpainting模型,提供给AI Art Generator、AI Avatar项目使用,支持文字转图片、文字+图片转图片功能;

2、UI通过Web UI界面,调整参数,生成效果图;后台通过调用API接口传递UI调整的参数,生成所需要的图片,提供给前端使用;

二、搭建步骤

1、使用https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/stable/stable_diffusion_inpainting_webui_colab.ipynb里面的代码,搭建环境;

2、启动服务时(上述代码的最后一句),添加–api参数;
python launch.py --xformers --enable-insecure-extension-access --theme dark --api # 本地服务
python launch.py --listen --xformers --enable-insecure-extension-access --theme dark --api # 外部服务

三、API调用

1、文字转图片

import requests
import io
import base64
from PIL import Image
 
def txt2img():
    url = "http://xx.com"
 
    payload = {
        "prompt": "cat",
         "negative_prompt": "xx",
         "steps": 20,
    }
 
    response = requests.post(url=f'{url}/sdapi/v1/txt2img', json=payload)
    images = response.json()['images']
    if len(images) > 0:
        image_data = images[0]
        image = Image.open(io.BytesIO(base64.b64decode(image_data.split(",", 1)[0])))
        image.save('output1.png')
 
txt2img()

2、文字+图片转图片

import requests
import io
import base64
from PIL import Image
 
def img2img():
    url = "http://xx.com"
 
    payload = {
        "prompt": "cat",
         "negative_prompt": "xx", 
         "steps": 20,
        "init_images": ["data:image/png;base64," + base64.b64encode(open('input.png', 'rb').read()).decode('utf-8')]
    }
 
    response = requests.post(url=f'{url}/sdapi/v1/img2img', json=payload)
    images = response.json()['images']
    if len(images) > 0:
        image_data = images[0]
        image = Image.open(io.BytesIO(base64.b64decode(image_data.split(",", 1)[0])))
        image.save('output1.png')
 
img2img()

3、更详细的文字+图片转图片

import requests
import io
import base64
from PIL import Image
 
def img2img_more():
    url = "http://xx.com"
    prompt = "((watercolor)), colorful, close-up, front, extreme detail, detailed, 8k, nice face, Portrait, flowing, fresh"
    negative_prompt = "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, bad anatomy, watermark, signature, cut off, low contrast, underexposed, overexposed, bad art, beginner, amateur, distorted face, blurry, draft, grainy"
 
    payload = {
        "prompt": prompt,
        "negative_prompt": negative_prompt,
        "steps": 30,
        "cfg_scale": 15.0,
        "denoising_strength": 0.65,
        "sampler_name": "Euler a",
        "inpainting_mask_weight": 0.35,
        "extra_generation_params": {"Mask blur": 4},
        "seed_resize_from_w": -1,
        "seed_resize_from_h": -1,
        "init_images": ["data:image/png;base64," + base64.b64encode(open('xx.png', 'rb').read()).decode('utf-8')]
    }
 
    response = requests.post(url=f'{url}/sdapi/v1/img2img', json=payload)
    images = response.json()['images']
    if len(images) > 0:
        image_data = images[0]
        image = Image.open(io.BytesIO(base64.b64decode(image_data.split(",", 1)[0])))
        image.save('output1.png')
 
img2img_more()

参考:https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/API

import os
import random
import datasets
import numpy as np
import pandas as pd
import torch
import transformers

from IPython.display import HTML, display
from datasets import ClassLabel, Sequence, load_dataset
from transformers import (AutoModelForSequenceClassification, AutoTokenizer,
                          EvalPrediction, Trainer, TrainingArguments,
                          default_data_collator)

print(f"Notebook runtime: {'GPU' if torch.cuda.is_available() else 'CPU'}")
print(f"PyTorch version : {torch.__version__}")
print(f"Transformers version : {datasets.__version__}")
print(f"Datasets version : {transformers.__version__}")

APP_NAME = "finetuned-bert-classifier"
os.environ["TOKENIZERS_PARALLELISM"] = "false"

dataset = load_dataset("imdb")

print(
    "Total # of rows in training dataset {} and size {:5.2f} MB".format(
        dataset["train"].shape[0], dataset["train"].size_in_bytes / (1024 * 1024)
    )
)
print(
    "Total # of rows in test dataset {} and size {:5.2f} MB".format(
        dataset["test"].shape[0], dataset["test"].size_in_bytes / (1024 * 1024)
    )
)

label_list = dataset["train"].unique("label")

def show_random_elements(dataset, num_examples=2):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset) - 1)
        while pick in picks:
            pick = random.randint(0, len(dataset) - 1)
        picks.append(pick)

    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
        elif isinstance(typ, Sequence) and isinstance(typ.feature, ClassLabel):
            df[column] = df[column].transform(
                lambda x: [typ.feature.names[i] for i in x]
            )
    display(HTML(df.to_html()))

show_random_elements(dataset["train"])

print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
print("~~~~~~~~~~~~ ~~~~~~~~~~~~~~~")
print("~~~~~~~~~~~~ ~~~~~~~~~~~~~~~")
print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~")

batch_size = 16
max_seq_length = 128
model_name_or_path = "bert-base-cased"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path,use_fast=True,)

tokenizer("Hello, this is one sentence!")
example = dataset["train"][4]
print(example)

tokenizer(
    ["Hello", ",", "this", "is", "one", "sentence", "split", "into", "words", "."],
    is_split_into_words=True,
)

# Dataset loading repeated here to make this cell idempotent
# Since we are over-writing datasets variable
dataset = load_dataset("imdb")
print("~~~~~8~~~~~~~~")
# Mapping labels to ids
# NOTE: We can extract this automatically but the `Unique` method of the datasets
# is not reporting the label -1 which shows up in the pre-processing.
# Hence the additional -1 term in the dictionary
label_to_id = {1: 1, 0: 0, -1: 0}


def preprocess_function(examples):
    """
    Tokenize the input example texts
    NOTE: The same preprocessing step(s) will be applied
    at the time of inference as well.
    """
    args = (examples["text"],)
    result = tokenizer(
        *args, padding="max_length", max_length=max_seq_length, truncation=True
    )

    # Map labels to IDs (not necessary for GLUE tasks)
    if label_to_id is not None and "label" in examples:
        result["label"] = [label_to_id[example] for example in examples["label"]]

    return result

print("~~~~~9~~~~~~~~")
# apply preprocessing function to input examples
dataset = dataset.map(preprocess_function, batched=True, load_from_cache_file=True)


model = AutoModelForSequenceClassification.from_pretrained(
    model_name_or_path, num_labels=len(label_list)
)

args = TrainingArguments(
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=1,
    weight_decay=0.01,
    output_dir="/tmp/cls",
)

def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
    preds = np.argmax(preds, axis=1)
    return {"accuracy": (preds == p.label_ids).astype(np.float32).mean().item()}

print("~~~~~10~~~~~~~~")

trainer = Trainer(
    model,
    args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    data_collator=default_data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

print("~~~~11~~~~~~~~")
trainer.train()
print("~~~~~12~~~~~~~~")
saved_model_local_path = "./models"

trainer.save_model(saved_model_local_path)
print("~~~~~13~~~~~~~~")
history = trainer.evaluate()
print("~~~~~14~~~~~~~~")
history

https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/community-content/pytorch_text_classification_using_vertex_sdk_and_gcloud/pytorch-text-classification-vertex-ai-train-tune-deploy.ipynb

trasformer
image6197364e54c621c4.png

简单回顾一下,encoder将token编码处理,得到embedding.然后送入decoder。decoder的input是前一个时间点产生的output。

image1959f33ef978d393.png

Masked Multi-Head Attention,Masked的意思是,在做self-attention的时候,这个decoder只会attend到已经产生的sequence(这个sequence长度和encoder的输出长度不一样),因为没有产生的部分无法做attention

BERT
结构:

BERT只使用了transformer的encoder部分.

input: token embedding +segment embedding + position embedding

会将输入的自然语言句子通过WordPiece embeddings来转化为token序列。之所以会有segment embedding是因为bert会做NSP(next sentense prediction)任务,判断两个句子间的关系,需要sentense级别的信息
image1959f33ef978d393.png

output:为预测这些被遮盖掉的token,被mask掉的词将会被输入到一个softmax分类器中,分类器输出的维度对应词典的大小。

GPT

GPT 预训练的方式和传统的语言模型一样,通过上文,预测下一个单词;GPT 预训练的方式是使用 Mask LM。

例如给定一个句子 [u1, u2, …, un],GPT 在预测单词 ui 的时候只会利用 [u1, u2, …, u(i-1)] 的信息,而 BERT 会同时利用 [u1, u2, …, u(i-1), u(i+1), …, un] 的信息

结构

GPT只使用了transformer的decoder部分,并去掉了第二个multi self attention layer

推荐Google的Colab可以免费,另外一种是付费的平台Autodl。

优惠注册链接:

AutoDL-品质GPU租用平台-租GPU就上AutoDL

Colab它的优点在于:
①免费,这应该是一个最大的福利,适合用于学生党作为入门的练习平台。
②Colab可以挂载Google的硬盘,所以可以跑大一点的项目。
缺点在于:
①免费是有限制的,当挂载一定时间后会自动断开。
②长时间使用后会被锁死,大概几天后会解封。
所以当真正入门后,当前的平台会先限制发展。之后如果长时间需要深度学习服务器,建议还是和老师沟通,自己的实验室采购一些显卡,性价比比较高。如果是短期使用比如几个月的时间,推荐租网上的深度学习服务器平台。
这里推荐Autodl,是我见过性价比比较高的的深度学习服务器平台,对于学生党非常的友好,学生认证后即可享受打折服务。优惠注册链接。
AutoDL-品质GPU租用平台-租GPU就上AutoDL​www.autodl.com/register?code=8458c843-06c7-440e-a189-b1f2c215c1cc