Skip to main content

Contribute to other work packages

The control programs will be inserted into the ADMIRE components and applications at control points. Control points are places in the system that implement reconfiguration or scheduling actions [15]. The control points can be used for implementing application-local or system-wide policies.

WP3

WP4

4.1 definitions of APIs, QoS metrics

  • clients can give hints about their I/O requirements to the I/O scheduler when asking for compute resources
  • They define which data is accessed, lifetime of data (should it be in gkfs or pfs), if other workflows reuse the data
  • We define QoS metrics, such as how many I/O operations per second (of a specific size, say 1 MB), and how resources are prioritized to users
  • These can be used to describe the QoS requirements for the job
  • We define a syntax to convey the usage of the ad hoc file system
  • User API to request data movement between storage tiers
  • Add addiitonal user code if the data should be transformed in any way before storing it on the backend FS, e.g. compression.

JGU will lead this task due to their knowledge of the I/O requirements on their ad hoc file systems.

JGU will define the required APIs for the batch scheduler with the project partners so that users can convey their I/O requirements to the I/O scheduler. Example interfaces in the context of ad hoc file systems are

  1. paths to the input data and paths where the output data should be placed within the PFS;
  2. how long the data should be available on the ad hoc file system, that is, should the ad hoc file system be scheduled within or outside the boundaries of the batch job;
  3. if other jobs need access to the data within the ad hoc file system. In this case, the ad hoc file system can run on a dedicated set of nodes, or the following jobs are scheduled to the same nodes of the node-local ad hoc file system. Nevertheless, I/O requirements can also include information about the data placement and distribution beneficial to the users' application.

Further, we define QoS metrics which allow insights on the used bandwidth of a user application. For instance, this can be based on the token-based system of Lustre's QoS extension that JGU and DDN developed in the past in which a user is allowed x amount of RPCs per second with each RPC being worth 1 MiB, regardless of wheather a user's I/O request uses the full megabyte of each RPC. This allows us to achieve insights into an application's behavior and allows users to ask for x amount of required bandwidth as a newly added the batch scheduler API. Based on the current usage and priorities (some users have a higher priority than others), the resources of the back-end storage system are fairly distributed. Such QoS metrics or, in general, the workload utilization should expand to the ad hoc file system so that users can make informed decisions on the ad hoc file system efficiency when running their applications.

Lastly, an API is defined for the batch scheduler, allowing users to ask for stage-in/out processes between storage tiers, e.g., PFS and ad hoc file system. In addition, the API should include ways to include custom intermediate code while moving the data between tiers. For instance, the job's output data could be processed and compressed before storing it on the PFS.

4.2 Scheduling algorithms and policies

  • develop I/O scheduling algorithms and benchmarks to evaluate them.
  • the current state of the system is used to drive scheduling decisions.
  • Goal is to minimize storage backend congestion, enforce QoS constraits for each job, and reuse data on local storage to minimize data movement and leverage on locality.
  • Algorithms to predict I/O behaviors and user/application behaviors with machine learning and other techniques
  • Goal is to minimize waiting times

JGU will offer methods to reduce congestion by coordinating ad-hoc file systems and the storage back-end in three ways:

  1. We'll define and implement optimized data movement strategies to minimize reading and writing data from and to the PFS, e.g., when staging-in/out data.
  2. We enforce the QoS requirements in the batch scheduler defined in 4.1 by leveraging on the Lustre QoS extensions.
  3. We implement the interfaces defined in 4.1 so that data can stay within the realms of the ad hoc file system across multiple jobs if they operate on the same input data or rely on the intermediate results of the previous job. In cases the same amount of nodes cannot be used, the data should be transferred between the compute nodes instead of storing them on the PFS.

4.3 On site, in transfer data transformations

  • Create control points inside I/O scheduler
  • Goal is that users can add custom code that is executed while data is transferred on the network or on-site when the data is on the local storage
  • on site data should be resued by rescheduling connected simulation and analysis tasks to the same nodes
  • control points will be offered to other admire components and ad hoc storage systems so that applications can use the functionalities through them (????)

JGU will implement interfaces and tools allowing users to execute their custom code for data processing at the compute node (in-situ). These users can then execute user-defined scripts or significantly extend certain file system I/O operations. For example, a modular file system interface could allow custom code to be executed before the ad hoc file system writes the back-end data to disk. One possible use case is the on-the-fly encryption of sensitive data so that raw data is never stored on node-local storage devices, which other users have access to in later scheduled batch jobs.