Pandas is a newer package built on top of Numpy and provides efficient implementation of a Dataframe. Dataframes are essentially multidimensional arrays with attached row and column labels, and often with heterogenous types and/or missing data.
Installing pandas on windows system:
pip install pandas
We will start the coding session by first importing the libraries:
import numpy as np
import pandas as pd
The Pandas Series Object:
A Pandas Series is a one-dimensional array of indexed data. It can be created from a list of array as follows:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
The Series wraps both a…
Up until now, we have been discussing some of the basic nuts and bolts of Numpy ; in this section we will dive deep into the reasons that Numpy is so important in the Python Data Science world.
The key to make the computation on Numpy arrays fast is to use vectorized operations, generally implemented through Numpy’s Universal functions (ufuncs). The vectorized approach is designed to push loop into the compiled layer that underlies Numpy, leading to much faster execution. …
Fundamentals of Python:
In order to access the dataset used in this tutorial, kindly follow the below link:
Importing Libraries for visualization of data:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Loading a simple delimited data file:
Print the data of the dataset:
Analytics: It is defined as, “The scientific process of transforming data into insights for making better decisions.”
Data Analysis: It is the process of examining, transforming and arranging raw data in a specific way to generate useful information from it.
Analysis ≠ Analytics
Analysis: What has happened in the past?
Analytics: What will happen in future?
Classification of Data Analytics:
Descriptive Analysis: It describes in detail about what happened in the past. It is the conventional form of business intelligence and Data Analysis.
Diagnostic Analysis: It is the form of advanced analytics which examines data or content to answer the…
Data manipulation in Python is nearly synonymous with Numpy array manipulation, even newer tools like Pandas are built around the Numpy array. This section will present several examples using Numpy and manipulation to access data and subarrays, and to split, reshape and join arrays.
Let’s start by defining three random arrays: a one-dimensional, two-dimensional, and three dimensional array. We’ll use Numpy’s random number generator, which we will seed with a set value in order to ensure that the same random arrays are generated each time this code is run.
In order to go through the Introduction of Numpy follow the…
Datasets can come from wide range of sources and wide range of formats, including collections of documents, collection of images, collection of sound clips or nearly anything else.
For example, Images can be thought of as simply 2 — dimensional arrays of numbers. Sound clips can be thought of as 1 — dimensional arrays of intensity versus time. No matter what the data is, the first step in making it analyzable will be to transform them into arrays of numbers.
In some ways, Numpy arrays are like Python’s built in list type, but Numpy arrays provide much more efficient storage…
Five Traits of good Data:
Accuracy, completeness, Reliability, Relevance and Timeliness. A Data Analyst must clean the dataset by removing duplicates, correcting formatting errors and removing blank rows.
Removing duplicated or inaccurate data and empty rows:
Its very common when collecting or importing data, whether through normal or automated processes to get errors and inconsistencies in your data. This can be as simple as spelling mistakes, extra white spaces or wrong case used in text, to empty rows or missing values in your data, to inaccurate or duplicated data. Having these errors and inconsistencies in your data can lead to…
Various functions and formulas can be used in order to make our work easier.
Split: The split button splits the screen into multiple sections, you can scroll each section separately.
Freeze panes: If you have headings in your columns like a header row, then you might want those to remain on screen while you move down the sheet. To do that you need to use Freeze Panes.
Microsoft Excel is one of the most widely used tools in any industry. While some enjoy playing with pivotal tables and histograms, others limit themselves to simple pie-charts and conditional formatting.
Once we write correct formula, we can expect automatic calculations. Excel helps to organize and easily access data. It helps to format, filter and sort data of table. We can edit, undo or use error-checking tools to help remedy those mistakes. We can analyze data and create charts, graphs and reports to help visualize our data analysis.
The most common business uses for spreadsheet applications include the following: