Home | Special Sessions | DASSL Day | Publications
Summer DASSL 2017
Summer DASSL is a special session of DASSL with the specific goal of promoting scientific research and scholarship among undergraduate students, as well as introduce students to modern tools and processes in software and data engineering.
During the special session, students are introduced to a “research process” and are given opportunities to research, discuss, write, present, and solve problems related to data science and data-intensive systems.
All activities are carried out in the context of real-life applications so students learn and apply practical skills that can set them apart from other CS graduates.
Summer DASSL 2017 was held between Jun. 05 2017 and Jul. 14 2017. It focused mainly on the following topics in the context of a real-life application called Gradebook:
- Database, web, and mobile application development
- Micro services and RESTful APIs
- Multi-tenancy
- Scalability
- Cloud-based services
- Information privacy and security
- DevOps
- Other topics as necessary (see discussion topics below)
The following people participated in Summer DASSL 2017:
- Kyle Bella
- Zaid Bhujwala
- Zach Boylan
- Andrew Figueroa
- Elly Griffin
- Timothy Herger
- Steven Rollo
- Hunter Schloss
The following table lists the topics discussed during Summer DASSL 2017 (excludes implementation activities students carried out):
Date | Topic | Leader(s) |
---|---|---|
6-05 | “Data Science” Vs “data science” | Murthy |
6-05 | Big data and its characteristics: volume, variety, velocity | Murthy |
6-05 | Mining vs querying; principle components analysis; classification vs clustering | Murthy |
6-05 | Harvard CS109 Intro slides; becomingadatascientist.com | Murthy |
6-06 | Kinds of analytics: descriptive, predictive, prescriptive | Murthy |
6-06 | Introduction to Gradebook: concept, context | Murthy |
6-07 | Bluemix: signup, explore | Team |
6-07 | Data-processing considerations: store, process, present, transfer | Murthy |
6-09 | Git repositories, Bitbucket | Rollo |
6-09 | pgAdmin | Figueroa |
6-12 | Outlier detection: Mahalanobis distance, masking, swamping | Griffin, Herger |
6-13 | Data pivots and pivot queries | Murthy |
6-14 | Markdown to markup: an introduction to Markdown | Schloss |
6-14 | Markdown to HTML: relationship to automata and language theory | Murthy |
6-14 | CSV-Pivot queries | Murthy |
6-16 | Introduction to SQL query optimization | Murthy |
6-16 | Losing data to optimize storage: Gradebook attendance information | Murthy |
6-19 | Outlier detection; demo | Griffin, Herger |
6-19 | K-Nearest Neighbor (KNN) for outlier detection | Murthy |
6-19 | Licensing: options and obligations | Bhujwala |
6-19 | Copyrights, trademarks, attribution | Murthy |
6-20 | Data Analytics concerns: efficiency, security, price, expression, performance, ease of dev & maintenance | Murthy |
6-20 | Intro to native analytics | Murthy |
6-20 | SQL-native KNN | Murthy |
6-20 | ETL: Extract, Transform, Load | Murthy |
6-20 | Importing and exporting CSV data; COPY FROM and COPY TO in Postgres | Team |
6-20 | Introduction to “Issues” in GitHub | Murthy |
6-21 | Rosters: anonymizing and humanizing | Murthy |
6-21 | Tutorial: Using Git Effectively | Figueroa, Rollo |
6-21 | Tutorial: Using Git Effectively | Figueroa, Rollo |
6-21 | Importing OpenClose schedule to Gradebook | Team |
6-22 | What it takes to run a lab like DASSL | Team |
6-26 | GitHub Desktop, try.github.io | Boylan |
6-26 | Demo: GitHub Desktop | Boylan |
6-26 | Standardizing DevOps tool chain in an organization | Murthy |
6-26 | Filling in missing data: alternatives to listing class meeting dates | Figueroa, Rollo |
6-26 | Customer delight as motivation to produce good software | Murthy |
6-28 | Satisfying vs satisficing | Murthy |
6-28 | SchoolTool Vs Gradebook | Team |
6-28 | Introduction to multi-tenancy | Murthy |
6-28 | Data Science boot camps based in NYC | Team |
6-29 | Function characteristics: idempotency, repeatability, side effects | Murthy |
6-29 | Importing student rosters | Bella, Figueroa |
6-29 | Multi-user operations | Murthy |
6-30 | Using JDBC to retrieve data; CBOD and KNN in Java with DB data | Griffin, Herger |
6-30 | Issues with API design; Java class loaded | Murthy |
6-30 | Writing maintainable code | Murthy |
6-30 | RETURNING in Postgres VS OUTPUT in MSSQL | Murthy |
6-30 | Gists on GitHub | Murthy |
7-03 | Social Network Analysis | Schloss |
7-03 | Elements of successful presentations; kinds of examples: simple, comprehensive, counter | Murthy |
7-03 | Scales: ordinal, nominal, interval, ratio | Murthy |
7-03 | Managing merge conflicts in Git | Rollo |
7-03 | Overview of ClassDB: focus on application roles | Figueroa |
7-05 | Representing seasons in Gradebook on a scale: detecting out of sequence imports | Murthy, Rollo |
7-05 | Anonymizing and humanizing data; adding salt to data | Murthy |
7-05 | Type inference in programming languages; auto in C++ | Murthy |
7-06 | Data models: HTML (not exactly a “data model”), XML, JSON | Murthy |
7-06 | Using XML and/or JSON in Gradebook: REST APIs | Murthy |
7-10 | SchoolTool | Boylan |
7-10 | JSON | Griffin |
7-10 | Web API frameworks: REST, SOAP, JSON, Node.js | Bhujwala |
7-10 | High-level architecture of Gradebook: session mgmt., connection pooling | Murthy |
7-10 | Building online portfolios | Murthy |
7-11 | Native-SQL implementation of KNN outlier detection | Schloss |
7-11 | Introduction to R | Herger |
7-11 | Domain-specific languages | Murthy |
7-11 | Library vs Language | Murthy |
7-12 | Union compatibility | Murthy |
7-12 | User-defined functions in Postgres | Team |
7-13 | YAML, JSON, XML: language homomorphism and isomorphism | Murthy |
7-13 | Humanizing student data | Bhujwala |
7-13 | Issues in web apps: where to do what | Murthy |
7-13 | Offloading work from DBMS to web server to client | Murthy |
7-13 | REST API for Gradebook | Team |
7-13 | DBMS functions to back Gradebook REST API | Bhujwala, Griffin |
7-13 | Material UI for Gradebook web client | Figueroa |
7-13 | Gradebook web server | Rollo |