自然言語処理と教育工学の知見を活かし、AI × フランス語学習に特化したアプリケーションを開発しています。
I develop AI-powered French learning applications, leveraging expertise in natural language processing and educational technology.
Kikagaku Inc. "AI & Data Science Professional Development Program for DX" - Completed
University of Tokyo - Matsuo Lab "Large Language Models (LLM) Course 2024" - Completed
University of Tokyo - Matsuo Lab "Global Consumer Intelligence Endowed Course 2024" - Attended
University of Tokyo - Matsuo Lab "AI Business Insights - AI Management Course 2025" - Completed
University of Tokyo - Matsuo Lab "AI and Semiconductor Course 2025" - Completed
Kikagaku Co., Ltd. - AI & Data Science Professional Development Program
Python Fundamentals (Syntax, Functions, Data Types, Modules)
Data Science (Multivariate Analysis, Data Analysis, Preprocessing, Feature Engineering)
Machine Learning (Classification, Regression, Evaluation with scikit-learn & XGBoost)
Computer Vision (OpenCV, Pillow, Image Preprocessing, CNN Fundamentals)
Deep Learning (PyTorch Introduction, Theory & Hands-on Practice)
Natural Language Processing (Fundamental Theory)
API Development (FastAPI Fundamentals & Practice)
University of Tokyo - Matsuo Lab "Large Language Models (LLM) Course 2024"
Overview of Language Models
Prompt Engineering & RAG (Retrieval-Augmented Generation)
Pre-training & Advanced Pre-training
Scaling Laws
Semiconductor Ecosystem & LLM Development Infrastructure
Supervised Fine-Tuning (SFT)
RLHF (Reinforcement Learning from Human Feedback) & AI Alignment
AI Safety & LLM Analysis and Theory
LLM Applications (Domain-Specific LLM, LLM for Control)
Final Project: LLM Competition Challenge
University of Tokyo - Matsuo Lab "Global Consumer Intelligence Endowed Course 2024"
Python Coding (NumPy, Pandas, Matplotlib)
Data Analysis (Feature Engineering, Unsupervised Learning, Time Series, Model Validation & Tuning)
Business Applications (SQL, Marketing Fundamentals & Applications, Guest Lectures)
Practical Exercises (Python Assignments, Data Analysis Competition, Final Project)
University of Tokyo - Matsuo Lab "AI Business Insights - AI Management Course 2025"
AI Trends, Business Use Cases & Industry-Specific Insights
Generative AI Technology Evolution & Future Prospects
AI Governance, Legal Regulations, Risk Management & Sustainability
Customer Experience, Marketing/Back-office Innovation & Robotics
Talent Development, AI Co-creation Strategy & AI-driven Management
Case Studies & Latest Industry Examples by Business Practitioners
Focus: AI implementation insights, industry-specific business trends, AI potential & risks, and comprehensive corporate strategy (both offensive & defensive approaches)
University of Tokyo - Matsuo Lab "AI and Semiconductor Course 2025"
Advanced AI Models such as LLMs, Machine Learning, Neural Network Basics, and Optimization
Image Recognition
Overview of the Semiconductor Ecosystem
CUDA and GPU Libraries
CPU and Computer Architecture
GPU Architecture and Design Principles
Combinational and Sequential Circuits, Hardware Design Theory, and Introduction to FPGA
[Workshop] FPGA Design Tutorial: Basics of FPGA, Design Methods, FPGA Design on AWS Cloud, and AI Processor Development
[Workshop] Practical FPGA Design Exercises (Hands-on and Cloud-based Sessions)
OpenAI API Integration (ChatGPT-powered app development)
Machine Learning Model Development & Evaluation (Classification & Regression with scikit-learn, XGBoost)
Data Preprocessing & Analysis (CSV integration, Feature Engineering, Exploratory Data Analysis)
Explainable AI (Model interpretation & visualization with SHAP - Beginner level)
LLM Fine-tuning (Lightweight experiments using Unsloth & LoRA)
spaCy (Natural Language Processing - Research & exploration stage)
効率よく単語を記憶できるよう、復習タイミングにこだわったフランス語学習アプリです。
Streamlitで開発し、ユーザーの記憶状況に合わせて最適なタイミングで単語を出題。忘却曲線の理論を応用し、着実な語彙定着をサポートします。
A French vocabulary learning app focused on maximizing retention through smart review scheduling. Built with Streamlit, it adapts to users' memory status and presents words at optimal times, applying the forgetting curve theory for solid vocabulary acquisition.
FastAPI・Streamlit・OpenAI APIを連携し、ChatGPT APIの翻訳出力を最適化するためのプロンプト設計・パラメータ調整を工夫。効率的かつ高品質な翻訳機能を実装しました。
Technical Challenge: Integrated FastAPI, Streamlit, and OpenAI API, and engineered prompt design and parameter tuning to optimize ChatGPT translation output, achieving efficient and high-quality translations.
音声再生や学習進捗の可視化、個別最適化された復習タイミングなど、より学びやすいAI学習アプリを目指して拡張予定です。
Future Vision: Plans for features like audio playback, progress visualization, and personalized review intervals to create an even more effective AI-powered language learning app.
Python, Streamlit, FastAPI, OpenAI API, Pandas, GitHub
🔗 GitHubでコードを見る
フランス語単語の難易度を機械学習で予測する学習支援Webアプリケーション。
Scikit-learnを用いた分類モデルと、StreamlitによるインタラクティブなUIを組み合わせて開発しました。
A web application that predicts French word difficulty using machine learning. Developed with Scikit-learn classification models and an interactive Streamlit interface.
データ前処理から機械学習モデルの構築・評価まで、MLプロジェクトの全工程を実装。精度向上に向けた試行錯誤を通じて、実践的なスキルを習得しました。
Technical Challenge: Implemented the complete ML pipeline from data preprocessing to model evaluation, gaining practical skills through iterative accuracy improvements.
現在は単語の難易度予測のみに対応していますが、今後は文章全体のレベル予測も行い、French Flashcard Study App と統合する予定です。
学習者の語彙レベルに合わせて、最適な教材や例文を自動で出題できるような設計を目指しています。
Future Outlook: Currently, the app supports only word-level difficulty prediction. In the future, it is planned to integrate this app with the French Flashcard Study App, enabling sentence-level difficulty assessment and automatic suggestion of appropriate materials and example sentences tailored to each learner's level.
Python, Scikit-learn, Pandas, Streamlit
🔗 GitHubでコードを見る
東京大学 松尾研究室主催「大規模言語モデル講座」最終課題として開発。
日本語ベンチマークである ELYZA-tasks-100 および Ichikara instruction を学習データとして使用し、その改変版である ELYZA-tasks-100tv を用いて、
命令応答能力(instruction-following)を強化するファインチューニングを実施しました。
ファインチューニングしたモデルは Hugging Face にて公開しています。
Developed as the final project for the "Large Language Models Course" hosted by the Matsuo Laboratory at the University of Tokyo. Fine-tuned for enhanced Japanese instruction-following capabilities using ELYZA-tasks-100 and Ichikara instruction datasets, with evaluation on ELYZA-tasks-100tv. The fine-tuned model is publicly available on Hugging Face.
日本語特化ベンチマークであるELYZA-tasks-100とIchikara instructionを学習データとし、ELYZA-tasks-100tvで評価することで日本語の命令理解能力を体系的に強化しました。
Unsloth + LoRA による軽量ファインチューニングを Google Colab 上で実施し、4bit量子化(QLoRA)によって GPU メモリ使用量を削減することで、限られた計算リソースでも安定した推論を実現し、未学習のタスクに対しても一定の応答性能を確認しました。
また、推論コードや JSONL 形式の出力フォーマットを整備し、再現性のある成果として公開しています。
Systematically enhanced Japanese instruction-following capabilities using ELYZA-tasks-100 and Ichikara instruction for training with ELYZA-tasks-100tv evaluation. Implemented lightweight fine-tuning using Unsloth + LoRA on Google Colab, achieving stable inference with limited computational resources through 4-bit quantization (QLoRA) for reduced GPU memory usage and confirmed reliable performance on unseen tasks. Established reproducible results by organizing inference code and JSONL output formats for public release.
本プロジェクトを通じて、PEFTによる効率的なファインチューニング手法と、 日本語LLMの評価手法について実践的な経験を積むことができました。
Through this project, I gained practical experience in efficient fine-tuning methods using PEFT and evaluation approaches for Japanese LLMs.
Python, Transformers, Hugging Face, Google Colab, Unsloth, LoRA, JSONL
🔗 Hugging Faceでモデルを確認するWhisperとChatGPTを組み合わせた音声対話AI「ミライミミ」を試作しました。 Gradio UIをGoogle Colab上で動作させ、ブラウザだけで音声入力・応答確認・読み上げまで可能な構成にしています。
MiraiMimi is a prototype combining Whisper and ChatGPT for voice-based interaction. The Gradio interface runs on Google Colab, enabling in-browser speech input, response, and audio playback.
音声入力のテキスト変換からChatGPTによる応答生成までの処理を組み合わせ、簡易的な対話パイプラインを試作しました。
Technical Challenge: Built a complete pipeline from speech input to text conversion, ChatGPT response generation, and speech synthesis. Focused on character design through prompt engineering and leveraging speech recognition variations for natural conversation flow.
教育分野での語学練習、観光案内での音声ガイドなど、音声AIの幅広い応用可能性を検証していきたいです。
そしてより高精度な音声認識と、感情表現豊かな音声合成の実装も目指しています。
Future Vision: Exploring applications in language learning and voice-guided tourism. Planning to implement higher-accuracy speech recognition and emotionally expressive speech synthesis.
Python, OpenAI Whisper, OpenAI GPT-3.5, Gradio, Google Colab, gTTS, pyttsx3
東京大学松尾研究室主催のGCI講座にて、Kaggle「Titanic: Machine Learning from Disaster」に参加。 データ前処理・EDA(探索的データ分析)・特徴量エンジニアリング・モデル構築・評価までの一連の流れを実践しました。 特徴量エンジニアリングでは、名前から敬称を抽出・カテゴリ化、家族人数の作成、Cabin 欠損の有無フラグ化、年齢の欠損を中央値で補完するなどの工夫を行い、予測精度向上を図りました。 さらに、ランダムフォレスト・ロジスティック回帰・MLP など複数モデルを構築し、GridSearchCV によるハイパーパラメータ調整を通じて性能を改善しました。
Participated in Kaggle's "Titanic: Machine Learning from Disaster" competition through the GCI program hosted by the University of Tokyo's Matsuo Lab. Implemented the data science pipeline from preprocessing, EDA (Exploratory Data Analysis), feature engineering, model building, to evaluation. For feature engineering, extracted and categorized titles from names, created family size features, flagged missing Cabin data, and imputed missing age values with medians to improve prediction accuracy. Built multiple models including Random Forest, Logistic Regression, and MLP, and improved performance through hyperparameter tuning with GridSearchCV.
Python, Pandas, Scikit-learn, Matplotlib, Seaborn, Random Forest, Logistic Regression, MLPClassifier, GridSearchCV
今後は XGBoost を導入し、さらに精度向上を図る予定です。 また、Kaggle の Titanic コンペに改めて参加し、より良いスコアを目指したいと考えています。
In the future, I plan to implement XGBoost to further improve prediction accuracy. I also aim to rejoin the Kaggle Titanic competition and strive for a higher score.
AirREGI Account Acquisition Prediction Project (GCI Final Assignment)
東京大学松尾研究室主催「GCIグローバル消費インテリジェンス」寄附講座の最終課題として、AirREGIのアカウント獲得数を外部要因から予測するモデルを開発しました。
Developed a predictive model for AirREGI account acquisition based on external factors as the final assignment for the University of Tokyo's "Global Consumer Intelligence" endowed course.
Objectives:
Quantitative analysis of external factors affecting account acquisition count (acc_get_cnt)
Built a predictive model considering call volume, TV commercials, holidays, day-of-week, and seasonality for forecasting account acquisitions
Technologies & Methods:
Python (pandas, seaborn, matplotlib, scikit-learn, XGBoost, Optuna, SHAP)
Multi-CSV integration (calendar_data, cm_data, call_data, acc_get_data) merged by date with time series preprocessing
XGBoost with Optuna hyperparameter optimization and SHAP for interpretability analysis
Analysis Results:
Combination of sales call volume and TV commercials was the most important factor(49.4% feature importance)
CM effect was moderate (effective especially Tue–Thu), while holidays had negative impact
Strong seasonality observed: September–October showed sharp increases, while June–July remained low
Optimized XGBoost model achieved R²=0.9238, enabling high-accuracy predictions
Through this project, my goal was not just to build a highly accurate predictive model, but also to extract insights that could be useful for business. To achieve this, I combined XGBoost with Optuna to improve model performance, while using SHAP to make the often “black box” nature of the model more interpretable. Through this process, I discovered that the interaction between sales calls and TV commercials played a major role, and that demand tended to rise in certain seasons—findings that wouldn’t have been visible from the numbers alone. The most valuable outcome was learning to see predictive modeling not as the “end,” but as a starting point for business strategy.
Related Links:
GitHub Repository: Source Code
Presentation: View Slides
MidjourneyとCanvaを活用し、AIで生成した画像素材を元に動画コンテンツを制作。
プロンプト設計から動画編集、YouTubeでの公開まで一貫して手掛けています。
Creating video content using AI-generated images from Midjourney combined with Canva editing. Handling the complete workflow from prompt design to video editing and YouTube publication.
Midjourney, Canva, ChatGPT (prompt engineering)