DASSL

Home | Special Sessions | DASSL Day | Publications

Summer DASSL 2017

Summer DASSL is a special session of DASSL with the specific goal of promoting scientific research and scholarship among undergraduate students, as well as introduce students to modern tools and processes in software and data engineering.

During the special session, students are introduced to a “research process” and are given opportunities to research, discuss, write, present, and solve problems related to data science and data-intensive systems.

All activities are carried out in the context of real-life applications so students learn and apply practical skills that can set them apart from other CS graduates.

Summer DASSL 2017 was held between Jun. 05 2017 and Jul. 14 2017. It focused mainly on the following topics in the context of a real-life application called Gradebook:

Database, web, and mobile application development
Micro services and RESTful APIs
Multi-tenancy
Scalability
Cloud-based services
Information privacy and security
DevOps
Other topics as necessary (see discussion topics below)

The following people participated in Summer DASSL 2017:

The following table lists the topics discussed during Summer DASSL 2017 (excludes implementation activities students carried out):

Date	Topic	Leader(s)
6-05	“Data Science” Vs “data science”	Murthy
6-05	Big data and its characteristics: volume, variety, velocity	Murthy
6-05	Mining vs querying; principle components analysis; classification vs clustering	Murthy
6-05	Harvard CS109 Intro slides; becomingadatascientist.com	Murthy
6-06	Kinds of analytics: descriptive, predictive, prescriptive	Murthy
6-06	Introduction to Gradebook: concept, context	Murthy
6-07	Bluemix: signup, explore	Team
6-07	Data-processing considerations: store, process, present, transfer	Murthy
6-09	Git repositories, Bitbucket	Rollo
6-09	pgAdmin	Figueroa
6-12	Outlier detection: Mahalanobis distance, masking, swamping	Griffin, Herger
6-13	Data pivots and pivot queries	Murthy
6-14	Markdown to markup: an introduction to Markdown	Schloss
6-14	Markdown to HTML: relationship to automata and language theory	Murthy
6-14	CSV-Pivot queries	Murthy
6-16	Introduction to SQL query optimization	Murthy
6-16	Losing data to optimize storage: Gradebook attendance information	Murthy
6-19	Outlier detection; demo	Griffin, Herger
6-19	K-Nearest Neighbor (KNN) for outlier detection	Murthy
6-19	Licensing: options and obligations	Bhujwala
6-19	Copyrights, trademarks, attribution	Murthy
6-20	Data Analytics concerns: efficiency, security, price, expression, performance, ease of dev & maintenance	Murthy
6-20	Intro to native analytics	Murthy
6-20	SQL-native KNN	Murthy
6-20	ETL: Extract, Transform, Load	Murthy
6-20	Importing and exporting CSV data; COPY FROM and COPY TO in Postgres	Team
6-20	Introduction to “Issues” in GitHub	Murthy
6-21	Rosters: anonymizing and humanizing	Murthy
6-21	Tutorial: Using Git Effectively	Figueroa, Rollo
6-21	Tutorial: Using Git Effectively	Figueroa, Rollo
6-21	Importing OpenClose schedule to Gradebook	Team
6-22	What it takes to run a lab like DASSL	Team
6-26	GitHub Desktop, try.github.io	Boylan
6-26	Demo: GitHub Desktop	Boylan
6-26	Standardizing DevOps tool chain in an organization	Murthy
6-26	Filling in missing data: alternatives to listing class meeting dates	Figueroa, Rollo
6-26	Customer delight as motivation to produce good software	Murthy
6-28	Satisfying vs satisficing	Murthy
6-28	SchoolTool Vs Gradebook	Team
6-28	Introduction to multi-tenancy	Murthy
6-28	Data Science boot camps based in NYC	Team
6-29	Function characteristics: idempotency, repeatability, side effects	Murthy
6-29	Importing student rosters	Bella, Figueroa
6-29	Multi-user operations	Murthy
6-30	Using JDBC to retrieve data; CBOD and KNN in Java with DB data	Griffin, Herger
6-30	Issues with API design; Java class loaded	Murthy
6-30	Writing maintainable code	Murthy
6-30	RETURNING in Postgres VS OUTPUT in MSSQL	Murthy
6-30	Gists on GitHub	Murthy
7-03	Social Network Analysis	Schloss
7-03	Elements of successful presentations; kinds of examples: simple, comprehensive, counter	Murthy
7-03	Scales: ordinal, nominal, interval, ratio	Murthy
7-03	Managing merge conflicts in Git	Rollo
7-03	Overview of ClassDB: focus on application roles	Figueroa
7-05	Representing seasons in Gradebook on a scale: detecting out of sequence imports	Murthy, Rollo
7-05	Anonymizing and humanizing data; adding salt to data	Murthy
7-05	Type inference in programming languages; auto in C++	Murthy
7-06	Data models: HTML (not exactly a “data model”), XML, JSON	Murthy
7-06	Using XML and/or JSON in Gradebook: REST APIs	Murthy
7-10	SchoolTool	Boylan
7-10	JSON	Griffin
7-10	Web API frameworks: REST, SOAP, JSON, Node.js	Bhujwala
7-10	High-level architecture of Gradebook: session mgmt., connection pooling	Murthy
7-10	Building online portfolios	Murthy
7-11	Native-SQL implementation of KNN outlier detection	Schloss
7-11	Introduction to R	Herger
7-11	Domain-specific languages	Murthy
7-11	Library vs Language	Murthy
7-12	Union compatibility	Murthy
7-12	User-defined functions in Postgres	Team
7-13	YAML, JSON, XML: language homomorphism and isomorphism	Murthy
7-13	Humanizing student data	Bhujwala
7-13	Issues in web apps: where to do what	Murthy
7-13	Offloading work from DBMS to web server to client	Murthy
7-13	REST API for Gradebook	Team
7-13	DBMS functions to back Gradebook REST API	Bhujwala, Griffin
7-13	Material UI for Gradebook web client	Figueroa
7-13	Gradebook web server	Rollo

DASSL

Data Systems & Solutions Lab

Summer DASSL 2017