CIFARP Workshop

Introduction to Data Sciences

The increasing availability of data from new analytical technologies has increased the demand for knowledge in storing, processing and interpreting the information generated by these approaches. The scientific community has responded to this demand with the development and dissemination of computational statistics for information management.

In this workshop we will use Python. Python is a very popular programming language in data science, and it has a simple syntax, which makes it a good choice for introductory programming courses.

The Jupyter project is an open source project that enables the creation and sharing of documents that contain code, equations, visualizations, and narrative texts. Examples of applications include: data preparation for statistical analysis, numerical simulations, statistical modeling, data visualization, machine learning, and more!

The purpose of this two days workshop is to introduce Jupyter using basic programming and data science concepts. We hope that the presentation of practical examples can stimulate the incorporation of the tools into the participants' research routine.

To subscribe to this workshop one has to be subscribed to CIFARP and send an email to ridasilva@usp.br. The workshop has 16 spaces available on a first-come, first-served basis and has no additional cost.

The workshop will be held on November 5 and 6 from 9:00 to 18:00 at the Ribeirão Preto Information Technology Center.

Program Summary

    Day One

  1. Introduction to Jupyter Platform
  2. Introduction to python programming language
    • 2.1. Programming Basics
      • 2.1.1. Data Structures
      • 2.1.2. Control Structures
      • 2.1.3. Logical Operations
      • 2.1.4. Functions
    • 2.2. Formatting Data with Pandas Library
    • 2.3. Performing Numerical Analyzes with Numpy Library
    • 2.4. Introduction to sklearn Library
  3. Day Two

  4. Introduction to Regression Analysis
  5. Introduction to Classification Analysis
  6. Introduction to Cluster Analysis
  7. Pharmaceutical Sciences Application Analysis

References

  1. McKinney, W. (2010). Data Structures for Statistical Computing in Python. In S. van der Walt & J. Millman (Eds.), Proceedings of the 9th Python in Science Conference (pp. 51–56).
  2. Oliphant, T. E. (2006). A guide to NumPy (Vol. 1). Trelgol Publishing USA.
  3. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, E. (2011). Scikit-learn: machine learning in Python. J Mach Learn Res, 12.
  4. Varmuza, K., & Filzmoser, P. (2009). Introduction to multivariate statistical analysis in chemometrics. CRC Press.