Data science competencies are important in research and teaching at ETH. However, few departments offer a coordinated data science education. In this project, we identified opportunities to improve data science education at D-USYS. We conducted 17 expert interviews and two participatory workshops to identify key competencies. Building on results from these interactions, we co-developed the new master lecture in “Environmental Systems Data Science”. Our participatory approach could guide other departments towards an improved data science education.
Computational data science competencies are an important pillar of research and teaching at ETH. However, few departments follow a coordinated approach to teach their students data science competencies that are relevant for their specific degree. With this project, we aimed to identify opportunities to better integrate data science competencies in the D-USYS master curriculum. Our participatory approach could guide other departments in defining their needs and designing an appropriate data science course.
A participatory approach
As a first step, we identified key data science competencies and mapped the existing expertise within the department. To do so, we conducted 17 semi-structured expert interviews with relevant researchers and lecturers. These interviews revealed that some approaches are specific to certain scientific fields, but the basic “data science workflow” is similar across disciplines. Thus, it seems sensible to create a joint course concept for all D-USYS master students where they first learn the most important common methods and best practices, and then deepen their knowledge according to their scientific interests.
As a second step, we consolidated these findings in an interactive workshop involving more than 30 lecturers, researchers and students. Through this broad involvement of relevant stakeholders, we gained strong support from students and faculty for the idea of a department-wide course concept. We decided to offer an introductory course “Environmental Systems Data Science” in the fall semester, followed by major-specific “Applied Modules” in the spring semester (Fig., green box).
As a third step, the detailed learning objectives and contents of the course “Environmental Systems Data Science” are co-developed with three assistant professors based on the findings from the expert interviews. This innovative new course will provide an overview of the most cutting-edge environmental data science concepts (e.g. applications of machine learning) and illustrate best practices from environmental sciences (e.g. data preparation, version control). The intro course is offered as a pilot in the fall semester 2020. In the subsequent spring semester, the applied modules will build on competencies from the intro course and enable students to gain hands-on experience.
The last step, so far, is to coordinate the contents of the new lecture with contents of existing courses and new “Applied Modules” (Fig.) To do so, a second interactive workshop will be held mid-November.
This project is on-going, and we anticipate being confronted with several challenges. We predict the main challenges will arise because 1.) students have heterogeneous backgrounds, 2.) learning success depends on how well the applied modules build on the competencies acquired in the intro course, and 3.) the course has high demands on the computer infrastructure due to large data volumes. To address these challenges, we will monitor the learning outcomes of both intro course and applied modules, continue to foster a dialogue between the lecturers, and provide guidance with the technical infrastructure.
Inspiration for other departments
Our approach of designing a joint course concept in a participatory process could encourage other departments to follow a similar path towards a coordinated data science education. Also, it would be interesting to compare our data science workflow with approaches from other departments to see whether this workflow applies to disciplines beyond environmental and agricultural sciences. Please get in touch with us if you are interested to exchange experiences.
Project Lead: Urs Brändle and Anouk N’Guyen
Vision for course structure
“Environmental Systems Data Science” – a new lecture for master students across disciplines
Lecturers: Loïc Pellissier, Benjamin Stocker, Joshua Payne
In “Environmental Systems Data Science”, students are introduced to a typical data science workflow using various examples from environmental systems. They learn common methods and key aspects for each step through practical application. The course enables students to plan their own data science project in their specialization and to acquire more domain-specific methods independently or in further courses.
The lecture is offered fully online in a flipped-classroom setting. Students watch pre-recorded videos and meet online to discuss exercises and work on application problems. The exercises and applications run on Jupyter Notebooks using Swiss Data Science Center’s computing infrastructure Renku (Link: https://renkulab.io)
Exercises and applications are peer-reviewed by students on a weekly basis via Moodle and discussed with Teaching Assistants in live online meetings.
The students are able to
• frame a data science problem and build a hypothesis
• describe the steps of a typical data science project workflow
• conduct selected steps of a workflow on specifically prepared datasets, with a focus on choosing, fitting and evaluating appropriate algorithms and models
• critically think about the limits and implications of a method
• visualise data and results throughout the workflow
• access online resources to keep up with the latest data science methodology and deepen their understanding
• The data science workflow
• Access and handle (large) datasets
• Prepare and clean data
• Analysis: data exploratory steps
• Analysis: machine learning and computational methods
• Evaluate results and analyse uncertainty
• Visualisation and communication
This interview showcases answers from 17 expert interviews in distilled form to selected questions. The answers do not claim to represent the perspectives of all interviewees.
- How do you define data science?
- Data science is making sense of data. It allows to see patterns and make predictions for the future.
- What kind of data are you working with?
- Our researchers are using data on different temporal and spatial scales from local to regional to global; we analyse structured and unstructured data; physical, organismal, genetic or societal data; data from field studies, lab experiments, citizen science projects and model outputs; data with gaps and messy data; data related to climate, energy, economy, biodiversity, agriculture, human and animal health. We work with small datasets and large datasets, private data and public data.
- What role play open data platforms in your research?
- In my view, we have an ethical and scientific obligation to promote open access to data. While this is often required by funding agencies and journals, too, it can get difficult with data related to industries. Best practices regarding meta data, licensing and versioning should be covered in a data science course.
- How did data science evolve in your field?
- Many processes were possible from an intellectual perspective since decades, but not from a technical perspective. Parallel computing opened many doors in this regard. Today, I observe an ever-increasing availability and size of data with higher spatial and temporal resolution. Thus, our demand on computing infrastructure to store and process data will increase steadily. Computing power can nowadays be more expensive than lab equipment! In the future, there will be even more data available, and making sense of data gets increasingly complex.
- Which data science competences should students acquire in their bachelor studies?
- Students should be able to perform basic statistical operations such as linear regression models and hypotheses testing, and implement these functions in a programming language such as Python or R. In general, they need to have basic computer literacy so that they are able to create a suitable computing environment by installing programs and loading packages. But most importantly, students need to have the courage to try and fail.
- Which data science competences should students acquire in their master studies?
- At the end of their master studies, students should be able to perform a complete data science workflow: access and collect data, prepare and clean data, analyse and evaluate data, visualise and communicate results. If a step is unclear, students should have the confidence to acquire the necessary skills independently and look for answers using available resources. Another key competency is to think critically about data, for example by quantifying uncertainties, performing sensitivity analyses or identifying sources of error.
- What topics and methods would be interesting to address in an interdisciplinary data science master course?
- An interdisciplinary course provides the opportunity to think across systems. In general, students should work with real datasets as early as possible to spark interest in the methods. Theoretical input should be minimal and hands-on practical work maximal. On the other hand, this course could provide the theoretical foundations so that later domain-specific courses can go into more depth.