Glad to meet you.

I'm Arsen, applicant.

A high schooler from Kazakhstan who loves AI and NLP β€” and I made this for you.

πŸ‘‹ Special for MBZUAI!

I'm Arsen Bakhitbekov,
NLP Researcher
Based in Almaty.

Independent AI researcher and developer seeking to align Large Language Models with diverse linguistic frameworks. My primary pursuit is pushing the boundaries of culturally aligned AIβ€”specifically focusing on MENA-region NLP and foundation models like the Jais ecosystem. Actively advancing cross-lingual LLM strategies and building scalable AI products. My ultimate academic goal is to join MBZUAI, contributing to the vanguard of global AI diversity.

4+
LLMs Analyzed
6
Languages
2
Research Papers
1st
EdTech Startup
πŸ”¬

Research & Projects

πŸ‡¬πŸ‡§πŸ‡·πŸ‡ΊπŸ‡¨πŸ‡³πŸ‡°πŸ‡ΏπŸ‡¦πŸ‡ͺ
Cross-Lingual LLM Bias Research
Independent Researcher
Sept 2023 - Present
Investigating how LLMs like Jais and Llama change their personality, safety stance, and depth of knowledge when switched from English to Russian, Chinese, or Kazakh. Quantified the 'Safety Curtain' and 'Underrepresented Tax', proving the critical need for region-specific models like Jais to prevent cultural erasure.

Tech: PyTorch β€’ HuggingFace β€’ Jais 30B
DragNScroll β€” EdTech via Short-Form Video
Founder / Technical Lead
Jan 2024 - Present
Building a platform leveraging short videos (TikTok-style) for immersive, context-rich Chinese vocabulary acquisition. Integrating NLP for auto-captioning, difficulty grading, and personalized content recommendation.

Tech: Targeting HSK Learners β€’ AI-driven personalization
πŸ‡¨πŸ‡³πŸ‡¬πŸ‡§
AI-Driven Business Chinese Platform
NLP Developer
2023
Developed an AI-driven web platform with an integrated NLP chatbot for simulating real-world business negotiations. Vastly improved over traditional static learning by deploying context-rich, simulation-based conversational AI.

Tech: Next.js β€’ Conversational AI β€’ Presented at YDF-2026
πŸ‡°πŸ‡Ώ
PM2.5 Allergy Monitoring ML
Data Scientist
2023
Applied mixed-methods ML to correlate atmospheric particulate matter with public health metrics in Almaty. Modeled environmental IoT data to propose an intelligent, personalized web monitoring platform architecture.

Tech: Python β€’ Scikit-learn β€’ Presented at DKU
🏒

Industry Experience

Platform Engineer & AI Integration Contributor
icomBooster β€” AI Agent for Sales
Explore
  • β€’Core Platform: Directly contributed to the core product β€” an AI-powered CRM booster (Next.js + FastAPI + Latenode) that enriches every new CRM signup with 40+ data points including company profile, role, funding stage, and tech stack.
  • β€’AI at Scale: Worked on integrating GPT-4 for automated lead scoring in under 2 seconds β€” operating at the intersection of real-time data pipelines and production-grade LLM inference.
  • β€’Launch Impact: The platform launched on the Intercom App Store and Product Hunt. Early adopters reported a +40% MRR lift, +18pp increase in qualified leads, and a -25% reduction in CAC β€” validated B2B outcomes at scale.
  • β€’API Architecture: Gained exposure to 40+ CRM & enrichment API integrations (documented via Swagger), developing a deep understanding of enterprise software ecosystems.
IT Developer & Support (Team Member)
Zerbulak (Largest Waterpark in KZ)
Explore
  • β€’IoT & Infrastructure: Collaborated on integrating smart locker systems and scaling the mobile ticketing ecosystem.
  • β€’System Ops: Contributed to optimizing backend logic to ensure 24/7 high-traffic stability for 1000s of daily visitors.
  • β€’Technical Support: Assisted in maintaining infrastructure reliability and bridging the gap between physical IoT hardware and digital user experience.
🧠

Cross-Lingual NLP Profile

Hover over the nodes to explore the linguistic alignment πŸš€

AI / NLP

Alignment
πŸ‡¨πŸ‡³Chinese (HSK 4)
πŸ‡¦πŸ‡ͺArabic (ATarget)
πŸ‡°πŸ‡ΏKazakh (Native)
πŸ‡·πŸ‡ΊRussian (Fluent)
🧠AI / NLP Alignment

Architecting context-aware evaluation pipelines to align LLMs with multi-cultural nuances, mitigating bias and safety drift.

LLM EvaluationDataset EngineeringScikit-LearnPyTorch

🧊 43 Quintillion Permutations

Whether it's solving a 3x3 Rubik's cube in 10 seconds or aligning an LLM across 6 vastly different linguistic structuresβ€”it all comes down to algorithmic precision and recognizing patterns hidden in chaos.

Click below to algorithmically align the model.

Alignment Protocol Terminal
Awaiting execution...