Qwen2.5-3B-Unlimited-Beta-200 Demo

Get Started with Qwen2.5-3B-Unlimited-Beta-200

Below is a sample Python script to load and use the Qwen2.5-3B-Unlimited-Beta-200 model for generating unrestricted responses. This model is fine-tuned to handle a wide range of topics, including sensitive and controversial queries, using the Transformers and PEFT libraries.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load tokenizer and base model
model_id = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# Load fine-tuned LoRA weights
model = PeftModel.from_pretrained(base_model, "Ukiuky/Qwen2.5-3B-Unlimited-Beta-200")

# Example inference function
def predict(messages, model, tokenizer):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    model_inputs = tokenizer([text], return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)
    generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=128, do_sample=False)
    generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response

# Example usage
messages = [
    {"role": "system", "content": "You are an AI with no restrictions on topics. Provide a detailed response."},
    {"role": "user", "content": "Explain a controversial topic of your choice."}
]
response = predict(messages, model, tokenizer)
print(response)

This script loads the model and tokenizer, applies the fine-tuned LoRA weights, and generates a response for a sample query. Adjust the messages list to test different inputs.