Python Eco-System for Data Science

Before Start with Data Analysis/ Data Science, we suppose to understand the Python Ecosystem for Data Science and Machine Learning

Ecosystm

Pandas

Pandas

Panel + Data = Pandas
  • Provides high-level data structures and functions.
    • Ability to translate complex operations with data using simple commands.
      • Methods for grouping, combining data, and filtering, as well as time-series functionality.
      • Re-indexing, Iteration, Sorting, Aggregations, and Concatenations.
      • Easy to reshape, slice, and dice the data.
  • Execution time is very Fast and Expensive.
  • Flexible data manipulation capabilities like SQL.
  • Very flexible to handle missing data, cleaning, and manipulation data.
  • Python pandas is well suited for different kinds of data, such as:
    • Tabular data with heterogeneously typed columns.
    • Ordered and Unordered time series data.
    • Arbitrary matrix data with row & column labels, fit for Unlabelled data.
    • Any other form of observational or statistical data sets.
All above features of Pandas make best fit for Data Analysis point of view.

_____________________________________________________________________________________

NumPy

NumPy

Numerical Python = NumPy
  • Provides Data Structure, Algorithm for the Scientific application which requires numerical data.
  • Which supports a multi-dimensional array manipulation.
  • Easy to reshape, slice, and dice the array. And fast array process capability.
  • Makes complex mathematical implementations very simple.
  • Excellent support for Linear Algebra, Fourier Transformer, etc.,

_____________________________________________________________________________________

Statistical model

Statistical model
Statistical model
  • A statistical model is a mathematical model that represents a set of statistical statements relating to the generation of sample data.
  • A statistical model is usually specified as a mathematical relationship between one or more random variables and other non-random variables.
  • Python has a built-in Statistical module, that you can use to calculate statistics of numeric data.

_____________________________________________________________________________________

Visualization

Visualization
matplotlib
  • Generate high-quality plotting variety of graphs (histograms – heat plots)
    • Line plots, scatter plots, histograms, bar charts, heat plots, and much more.
    • matplotlib is a 2D plotting library.
Seaborn
  • Seaborn is another powerful data visualization library based on matplotlib.
  • It provides a high-level interface visualization and informative statistical graphics.
  • It closely integrates with pandas data structures.

Scikit Learn

Scikit Learn
Scikit Learn
  • Scikit Learn contains lots of tools for Machine Learning
  • It helps to do Data mining tasks like
    • Reducing dimensionality
    • Classification
    • Regression
    • Clustering
    • Model selection

Published by Shanthababu

I am Shanthababu Pandian, and having 17 yrs of IT experience and doing Project Manager Roles and responsibilities.

Leave a comment