Benchmark Tracker

Closing the distance to SOTA.

Subnet 122 vs. the published recommendation literature on NDCG@10 and Recall@10.

Eval Set 2 · NDCG@10

COMING · NEXT EVAL SET

SOTA ceiling · ~0.156

Published SOTA range · 0.03–0.07

Bitrecs · next eval set

Tracking ndcg_at10_bm25_appliances_1000_medium — BM25-prefiltered re-ranking over a 1000-item candidate set, not full-universe retrieval. Published SOTA clusters in ~0.03–0.07 on full-universe Amazon Beauty (MT4SR+SDA, BSARec, eSASRec). Ceiling shown at 0.156 reflects the documented 3× protocol gap (Petrov & Macdonald replicability study) — direct numerical comparisons across protocols are not apples-to-apples.

Eval Set 2 · Recall@10

COMING · NEXT EVAL SET

SOTA ceiling · ~0.200

Published SOTA range · 0.05–0.09

Bitrecs · next eval set

Tracking recall_at10_bm25_electronics_500_medium (a.k.a. HR@10) — BM25-prefiltered re-ranking over a 500-item candidate set, not full-universe retrieval. Published SOTA clusters in ~0.05–0.09 on full-universe protocols. Ceiling shown at 0.20 accounts for the well-documented protocol variance — recall scales with candidate-set size, so prefiltered scores are naturally higher.

References— baselines and methodology

Classic baselines

SASRecICDM '18

Kang & McAuley. Self-Attentive Sequential Recommendation · Amazon Beauty NDCG@10 ~0.0126–0.0416

BERT4RecCIKM '19

Sun et al.. Sequential Recommendation with Bidirectional Encoder Representations from Transformer · Amazon Beauty NDCG@10 ~0.0187–0.0396

Strong recent baselines

SASRec+RecSys '23

Klenitskiy & Vasilev. Turning Dross Into Gold Loss: Is BERT4Rec really better than SASRec? · NDCG@10 0.0327

DuoRecWSDM '22

Qiu et al.. Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation · NDCG@10 ~0.0546

CARCARecSys '22