AI Research Portfolio Post

주제: 6.5-6 Feature Store (Training-Serving Skew와 해결 전략)

분류: rag_agent

타입: concept

난이도: 중급

선수지식: 있음 — ML Pipeline, Model Serving

문제 설정

ML 시스템에서는 학습(training)과 서비스(serving) 단계에서 동일한 feature가 사용되어야 합니다.

하지만 실제 시스템에서는 두 환경이 다르기 때문에 다음 문제가 발생합니다.

feature 계산 방식 불일치
데이터 지연(latency) 차이
데이터 pipeline 차이

이 문제를 Training-Serving Skew라고 합니다.

모델 예측:

y = f (x; w)

기호 의미

x : feature vector
w : 모델 파라미터
y : 예측 결과

문제는 학습과 서비스에서 사용되는 feature가 달라질 수 있다는 점입니다.

x_{t} r a i n \neq x_{s} e r v i n g

왜 중요한가

동일 모델이라도 feature가 다르면 예측 성능이 크게 떨어질 수 있습니다.

1. Training-Serving Skew

Training-Serving Skew는 다음과 같이 표현할 수 있습니다.

P_{t} r a i n (x) \neq P_{s} e r v i n g (x)

기호 의미

P_train(x) : 학습 데이터 분포
P_serving(x) : 서비스 feature 분포

대표 원인

feature 계산 코드 차이
데이터 timestamp mismatch
aggregation window 차이

2. Feature Engineering 문제

예를 들어 다음 feature를 생각해 봅니다.


user_avg_purchase_last_7_days

training에서는 batch 처리로 계산됩니다.


SQL batch job

serving에서는 실시간 계산이 필요합니다.


real-time feature service

이 과정에서 계산 방식이 달라질 수 있습니다.

3. Feature Store 개념

Feature Store는 feature를 중앙에서 관리하는 시스템입니다.


Raw Data
   ↓
Feature Engineering
   ↓
Feature Store
   ↓
Training / Serving

feature 정의:

f = φ (x)

기호 의미

x : raw data
φ : feature transformation
f : feature vector

왜 필요한가

training과 serving에서 동일 feature를 사용하도록 보장합니다.

4. Offline vs Online Feature Store

구성	역할
Offline Store	training 데이터 저장
Online Store	실시간 inference feature 제공

구조:


Data Lake
   ↓
Offline Feature Store
   ↓
Training

Online Feature Store
   ↓
Model Serving

5. Point-in-Time Correctness

Feature store에서 중요한 개념은 Point-in-Time Correctness입니다.

즉 training feature는 과거 시점 기준으로 계산되어야 합니다.

f e a t u r e (t) = φ (d a t a \leq t)

기호 의미

t : 이벤트 시점

왜 필요한가

future data leakage를 방지합니다.

6. Feature Store 기능

기능	설명
feature reuse	feature 재사용
versioning	feature 버전 관리
consistency	training/serving 동일성 보장
lineage	feature 데이터 추적

7. Feast

Feast는 오픈소스 feature store입니다.

구성:

offline store
online store
feature registry

예:


feast apply
feast materialize

8. Tecton

Tecton은 production feature platform입니다.

특징:

real-time feature pipeline
feature monitoring
ML pipeline integration

코드-수식 연결

개념	코드	설명
feature 정의	`@feature_view`	feature pipeline 정의
feature 조회	`store.get_online_features()`	online feature 조회
feature materialize	`feast materialize`	offline → online sync

자주 하는 오해 5개

feature engineering은 training에서만 필요하다고 생각한다
training 데이터와 serving 데이터는 항상 동일하다고 생각한다
feature store는 데이터베이스와 동일하다고 생각한다
training-serving skew는 드물게 발생한다고 생각한다
feature versioning은 필요 없다고 생각한다

체크리스트 (스스로 설명 가능해야 하는 질문)

Training-Serving Skew는 왜 발생하는가?
Feature Store는 어떤 문제를 해결하는가?
Offline Store와 Online Store의 역할은 무엇인가?
Point-in-Time Correctness는 왜 중요한가?
Feast와 Tecton은 어떤 역할을 하는가?