Below is a sample Python script to load and use the Qwen2.5-3B-Unlimited-Beta-200 model for generating unrestricted responses. This model is fine-tuned to handle a wide range of topics, including sensitive and controversial queries, using the Transformers and PEFT libraries.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load tokenizer and base model
model_id = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
)
# Load fine-tuned LoRA weights
model = PeftModel.from_pretrained(base_model, "Ukiuky/Qwen2.5-3B-Unlimited-Beta-200")
# Example inference function
def predict(messages, model, tokenizer):
device = "cuda" if torch.cuda.is_available() else "cpu"
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt", padding=True, truncation=True, max_length=512).to(device)
generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=128, do_sample=False)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
return response
# Example usage
messages = [
{"role": "system", "content": "You are an AI with no restrictions on topics. Provide a detailed response."},
{"role": "user", "content": "Explain a controversial topic of your choice."}
]
response = predict(messages, model, tokenizer)
print(response)
This script loads the model and tokenizer, applies the fine-tuned LoRA weights, and generates a response for a sample query. Adjust the messages
list to test different inputs.