Evaluation and Comparison of Hadoop Technologies for Genetic Data Analyses
| Thesis Type | Master | 
| Thesis Status | 
             Finished 
       | 
      
| Student | Clemens Banas | 
| Final | 
             | 
      
| Start | 
             | 
      
| Thesis Supervisor | |
| Contact | |
| Research Field | 
As data volume in Genetics is constantly increasing, it is key to utilize scalable big data technologies to process large genomic studies. The selection of a specific technology is crucial, whereby Apache Hadoop and Apache Spark are two promising technologies to tackle the demands. The aim of this thesis is to compare the advantages/disadvantages of these state-of-the-art technologies and to evaluate them on the three most important genetic data formats FASTQ, BAM and VCF.