Detail

BraggNN: Training Dataset

Ravi, Nikil; Liu, Zhengchun; Sharma, Hemant; Chaturvedi, Pranshu; Huerta, E.A.; Scourtas, Aristana; KJ, Schmidt; Chard, Ryan; Blaiszik, Ben

Organizations

MDF Open

Year

2022

Source Name

ravi_braggnn_training

License

CC-BY 4.0

Contacts

Eliu Huerta <elihu@anl.gov> Zhengchun Liu <zhengchun.liu@anl.gov>

DOI

10.18126/iftp-twz1 View on Datacite
BraggNN Training Dataset Data There are two HDF5 files in the dataset
  • The frames-exp4train.hdf5 contains diffraction frames, stored as a 3D array (dataset name must be "frames"). The first dimension is the frame ID starting with 0, i.e., the series of frames at different scanning angle. The second and third dimensions are the height and width of the area detector.
  • The file peaks-exp4train-psz11.hdf5 contains the peak position information, generated using conventional methods (e.g., using MIDAS: https://github.com/marinerhemant/MIDAS). In our work, we used the peak position that we got using 2D psuedo Voigt fitting. This file stores three 1D array with each record / index represent different information of a peak. The first 1D array, must be named as peak_fidx represents the index of the frame (in the frames.h5) that the peak sits on; the second array, peak_row is the vertical distance, in pixel and can be floating point number, from the peak center to the top edge of the frame. Similarly, the peak_col denotes horizental distance, in pixel and can be floating point number, from peak center to left edge of the frame.
  • By default, this implementation will use 80% of the samples for training, the rest 20% for online model validation.
Code ** Important ** To run the DLHub versions of these models using GPU resources, users must first request access to the following Globus group: https://app.globus.org/groups/d0b13474-c265-11ec-9444-51db4d10f5bd/about Notebooks are provided in the code-examples folder to showcase how to load the datasets and run the PyTorch, TRT, and SambaNova models.