Cross-encoder를 사용한 Open Named Entity Recognition 모델 개발

1. Introduction

개체명 인식(Named entity recogntion)은 텍스트에서 사람, 지역, 단위 등과 같은 개체명을 인식하는 문제다.
대부분의 개체명 인식은 미리 정해진 클래스 안에서 개체명을 분류하는 closed-ner 방식으로 이뤄진다.
하지만 closed-ner 방식은 새로운 클래스의 개체명 인식을 위해서 새로 데이터셋을 구축해야 한다는 단점이 있다.
이를 해소하기 위해 최근 새로운 클래스의 개체명도 인식할 수 있는 open-ner 연구가 제안되고 있다.
이 글에서는 cross-encoder 구조를 사용하여 open-ner 모델을 학습하는 방법에 대해 다룬다.
사용한 코드와 모델은 아래 url에서 확인할 수 있다.
- 코드, 영어 모델, 다국어 모델

2. UniversalNER

ChatGPT의 일반화 능력을 더 작은 모델(Alpaca, Vicuna)로 학습하는 지식 증류(knoweldge distillation) 방법들이 시도되고 있다.
UniversalNER은 지식 증류를 특정 문제에 한정해서 진행할 경우(targeted distillation), 작은 모델로도 좋은 성능 낼 수 있음을 보였다.
저자들은 open-ner task에 대한 ChatGPT의 생성물로 데이터를 만들었고, 이를 사용하여 LLaMA 7B 모델을 학습시켰다.
저자들이 사용한 데이터와 모델은 여기서 확인할 수 있다.

UniversalNER demo 예시. 사용자는 문장과 찾고 싶은 개체명 클래스를 입력한다.

3. Cross-encoder for Open NER

실제 상황에서 단순히 NER만을 위해 7B 모델을 운영하는 것은 부담스럽다.
또한 생성 모델이 가지는 hallucination 가능성은 정보 추출 도메인에서 한계점으로 지적된다.
이를 해소하기 위해 나는 UniversalNER 데이터셋을 Encoder 구조의 LM으로 학습시켰다.
두 개의 문장을 입력 받는 cross-encoder 구조를 사용하여, 사용자가 입력한 개체명 클래스에 해당하는 개체명을 문장에서 찾도록 하였다.
closed-ner에서 사용하는 token classification 방법으로 학습하였으며, 토큰별로 O, B-Entity, I-Entity를 분류하도록 했다.

4. Results

microsoft/deberta-v3-base 모델을 10 epochs 동안 학습 후 validation entity f1 score가 가장 높은 모델을 최종 선택했다.
Quantitative results
- entity f1: 0.560
- token f1: 0.747

Examples

다음은 예시 문장과 사용한 개체명 클래스에 따른 결과이다.

Heat the olive oil in a frying pan, add the onion and cook for 5 minutes until softened and starting to turn golden. Set aside.

ingredient: olive oil, onion
tool: frying pan
time: 5 minutes

Particularly, Alpaca (Taori et al., 2023) automates the generation of instructions (Wang et al., 2022a) and distills the knowledge from a teacher LLM.

person: Taori, Wang et al
year: 2023, 2022
technology: Alpaca

Ha-Seong Kim opens the scoring with a two-run single up the middle, giving the Padres a 2-0 lead in the 8th inning.

person: Ha-Seong Kim
team: Padres
score: 2-0
period: 8th inning

Reference

[1] Zhou, W., Zhang, S., Gu, Y., Chen, M., & Poon, H. (2023). UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition. arXiv preprint arXiv:2308.03279.

'ai' 카테고리의 다른 글

VIsual LAyout(VILA) 모델로 논문 PDF 파일에서 구조를 추출하는 방법 (1)	2023.07.15
BERTScore Knowledge Distillation (0)	2023.07.06
Maximal Marginal Relevance를 사용한 뉴스 요약 (0)	2023.07.04
Hydra + Lightning Fabric으로 딥러닝 학습 template 만들기 (0)	2023.05.18
꼬맨틀 풀이 프로그램 개발 (0)	2023.05.17

yongsun's blog

Cross-encoder를 사용한 Open Named Entity Recognition 모델 개발

1. Introduction

2. UniversalNER

3. Cross-encoder for Open NER

4. Results

Examples

Reference

'ai' 카테고리의 다른 글

티스토리툴바

Cross-encoder를 사용한 Open Named Entity Recognition 모델 개발

1. Introduction

2. UniversalNER

3. Cross-encoder for Open NER

4. Results

Examples

Reference

'ai' 카테고리의 다른 글

'ai' Related Articles

티스토리툴바