BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20241120T082409Z
LOCATION:HG E 1.1
DTSTART;TZID=Europe/Stockholm:20240604T120000
DTEND;TZID=Europe/Stockholm:20240604T123000
UID:submissions.pasc-conference.org_PASC24_sess146_msa231@linklings.com
SUMMARY:Efficient Training of GNN-based Material Science Applications at S
 cale: An Orchestration of Data Movement Approach
DESCRIPTION:Minisymposium\n\nJonghyun Bae (Lawrence Berkeley National Labo
 ratory); Jong Youl Choi, Massimiliano Lupo Pasini, and Kshitij Mehta (Oak 
 Ridge National Laboratory); Khaled Ibrahim (Lawrence Berkeley National Lab
 oratory); and Pei Zhang (Oak Ridge National Laboratory)\n\nScalable data m
 anagement techniques are crucial to effectively processing large volumes o
 f scientific data on HPC platforms for distributed deep learning (DL) mode
 l training. Because of the need to access data randomly and frequently in 
 stochastic optimizers, in-memory distributed storage that keeps the datase
 t in the local memory of each computing node is widely adopted over file-b
 ased I/O for its rapid speed. \nIn this presentation, we discuss the trade
 off of various data exchange mechanisms. We present a hybrid in-memory dat
 a loader with multiple communication backends for distributed graph neural
  network training. We introduce a model-driven performance estimator to sw
 itch between communication mechanisms automatically at runtime. The perfor
 mance estimator uses Tree of Parzen Estimators (TPE), a Bayesian Optimizat
 ion method, to optimize model parameters and dynamically select the most e
 fficient communication method for data loading. We present our evaluation 
 on two US DOE supercomputers, NERSC Perlmutter and OLCF Summit, on a wide 
 set of runtime configurations. Our optimized implementation outperforms a 
 baseline using single-backend loaders by up to 2.83x and can accurately pr
 edict the suitable communication method with an average success rate of 96
 .3% (Perlmutter) and 94.3% (Summit).\n\nDomain: Chemistry and Materials\n\
 nSession Chair: John Gounley (Oak Ridge National Laboratory)
END:VEVENT
END:VCALENDAR
