Elastreaming is a streaming based framework, that supports the use of high performance cloud computing resources with incremental processing feature, where large datasets are processed while being transferred to the computing machines on the cloud. Our framework targets the wide class of sequence analysis tasks, where the input data can be decomposed and processed indepen- dently in parallel. but also whole workflow systems and the MapReduce framework.
Functionalities related to computing machines
Establishment and management of a computer cluster on the cloud including MapReduce cluster (Elastic MapReduce Product of Amazon). The cluster can be of any virtual machine type in AWS and of any size, provided that the user account permits this.
Addition and removal of compute nodes of the cluster in run time.
Automatic configuration of the cluster middleware including PBS torque for job scheduling and MPI for parallel programming.
Automatic setting of security options to facilitate the communication between the machines.
Functionalities related to storage and data transfer
Mounting EBS volumes to compute nodes (EBS stands for Elastic Block Store and it is like a hard- of flash-disk.)
Creating EBS Volumes from EBS snspshots
Automatic configuration of the share file system NFS
Associating S3 storage to compute nodes as a shared file system
Transfer of data from client’s local machine to the cloud machines or the S3 account.
Transfer of data among cloud nodes in efficient way.
Functionalities related to remote job submission
Submission of jobs from client local machine to the cloud cluster based on a protocol similar to the REST protocol, where the user can monitor job status and retrieve the data back to his local machine.