Before Start with Data Analysis/ Data Science, we suppose to understand the Python Ecosystem for Data Science and Machine Learning
Pandas
Panel + Data = Pandas
- Provides high-level data structures and functions.
- Ability to translate complex operations with data using simple commands.
- Methods for grouping, combining data, and filtering, as well as time-series functionality.
- Re-indexing, Iteration, Sorting, Aggregations, and Concatenations.
- Easy to reshape, slice, and dice the data.
- Ability to translate complex operations with data using simple commands.
- Execution time is very Fast and Expensive.
- Flexible data manipulation capabilities like SQL.
- Very flexible to handle missing data, cleaning, and manipulation data.
- Python pandas is well suited for different kinds of data, such as:
- Tabular data with heterogeneously typed columns.
- Ordered and Unordered time series data.
- Arbitrary matrix data with row & column labels, fit for Unlabelled data.
- Any other form of observational or statistical data sets.
All above features of Pandas make best fit for Data Analysis point of view.
_____________________________________________________________________________________
NumPy
Numerical Python = NumPy
- Provides Data Structure, Algorithm for the Scientific application which requires numerical data.
- Which supports a multi-dimensional array manipulation.
- Easy to reshape, slice, and dice the array. And fast array process capability.
- Makes complex mathematical implementations very simple.
- Excellent support for Linear Algebra, Fourier Transformer, etc.,
_____________________________________________________________________________________
Statistical model
Statistical model
- A statistical model is a mathematical model that represents a set of statistical statements relating to the generation of sample data.
- A statistical model is usually specified as a mathematical relationship between one or more random variables and other non-random variables.
- Python has a built-in Statistical module, that you can use to calculate statistics of numeric data.
_____________________________________________________________________________________
Visualization
matplotlib
- Generate high-quality plotting variety of graphs (histograms – heat plots)
- Line plots, scatter plots, histograms, bar charts, heat plots, and much more.
- matplotlib is a 2D plotting library.
Seaborn
- Seaborn is another powerful data visualization library based on matplotlib.
- It provides a high-level interface visualization and informative statistical graphics.
- It closely integrates with pandas data structures.
Scikit Learn
Scikit Learn
- Scikit Learn contains lots of tools for Machine Learning
- It helps to do Data mining tasks like
- Reducing dimensionality
- Classification
- Regression
- Clustering
- Model selection