Pandas is an open-source Python library mainly used for data manipulation and analysis. It's built on top of the NumPy library and provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
In this article, you'll learn how to perform 6 basic operations using Pandas.
Using Pandas Examples
You can run the examples in this article using computational notebooks like Jupyter Notebook, Google Colab, etc. You can also run the examples by entering the code directly into the Python interpreter in interactive mode.
If you want to have a look at the complete source code used in this article, you can access the Python Notebook file from this GitHub repository.
1. How to Import Pandas as pd and Print the Version Number
You need to use the import keyword to import any library in Python. Pandas is typically imported under the pd alias. With this approach, you can refer to the Pandas package as pd instead of pandas.
import pandas as pd
print(pd.__version__)Output:
1.2.42. How to Create a Series in Pandas
Pandas Series is a one-dimensional array that holds data of any type. It's like a column in a table. You can create a series using numpy arrays, numpy functions, lists, dictionaries, scalar values, etc.
The values of the series are labeled with their index number. By default, the first value has index 0, the second value has index 1, and so on. In order to name your own labels, you need to use the index argument.
How to Create an Empty Series
s = pd.Series(dtype='float64')
sOutput:
Series([], dtype: float64)In the above example, an empty series with the float data type is created.
How to Create a Series Using NumPy Array
import pandas as pd
import numpy as np
d = np.array([1, 2, 3, 4, 5])
s = pd.Series(d)
sOutput:
0 1
1 2
2 3
3 4
4 5
dtype: int32Related: NumPy Operations for Beginners
How to Create a Series Using List
d = [1, 2, 3, 4, 5]
s = pd.Series(d)
sOutput:
0 1
1 2
2 3
3 4
4 5
dtype: int64How to Create a Series With Index
In order to create a series with an index, you need to use the index argument. The number of indexes must be equal to the number of elements in the series.
d = [1, 2, 3, 4, 5]
s = pd.Series(d, index=["one", "two", "three", "four", "five"])
sOutput:
one 1
two 2
three 3
four 4
five 5
dtype: int64How to Create a Series Using Dictionary
The keys of the dictionary become the labels of the series.
d = {"one" : 1,
"two" : 2,
"three" : 3,
"four" : 4,
"five" : 5}
s = pd.Series(d)
sOutput:
one 1
two 2
three 3
four 4
five 5
dtype: int64How to Create a Series Using Scalar Value
If you want to create a series using a scalar value, you must provide the index argument.
s = pd.Series(1, index = ["a", "b", "c", "d"])
sOutput:
a 1
b 1
c 1
d 1
dtype: int643. How to Create a Dataframe in Pandas
A DataFrame is a two-dimensional data structure where data is aligned in the form of rows and columns. A DataFrame can be created using dictionaries, lists, a list of dictionaries, numpy arrays, etc. In the real world, DataFrames are created using existing storage like CSV files, excel files, SQL databases, etc.
The DataFrame object supports a number of attributes and methods. If you want to know more about them, you can check out the official documentation of pandas dataframe.
How to Create an Empty DataFrame
df = pd.DataFrame()
print(df)Output:
Empty DataFrame
Columns: []
Index: []How to Create a DataFrame Using List
listObj = ["MUO", "technology", "simplified"]
df = pd.DataFrame(listObj)
print(df)Output:
0
0 MUO
1 technology
2 simplifiedHow to Create a DataFrame Using Dictionary of ndarray/Lists
batmanData = {'Movie Name' : ['Batman Begins', 'The Dark Knight', 'The Dark Knight Rises'],
'Year of Release' : [2005, 2008, 2012]}
df = pd.DataFrame(batmanData)
print(df)Output:
Movie Name Year of Release
0 Batman Begins 2005
1 The Dark Knight 2008
2 The Dark Knight Rises 2012
How to Create a DataFrame Using List of Lists
data = [['Alex', 601], ['Bob', 602], ['Cataline', 603]]
df = pd.DataFrame(data, columns = ['Name', 'Roll No.'])
print(df)Output:
Name Roll No.
0 Alex 601
1 Bob 602
2 Cataline 603How to Create a DataFrame Using List of Dictionaries
data = [{'Name': 'Alex', 'Roll No.': 601},
{'Name': 'Bob', 'Roll No.': 602},
{'Name': 'Cataline', 'Roll No.': 603}]
df = pd.DataFrame(data)
print(df)Output:
Name Roll No.
0 Alex 601
1 Bob 602
2 Cataline 603Related: How to Convert a List Into a Dictionary in Python
How to Create a DataFrame Using zip() Function
Use the zip() function to merge lists in Python.
Name = ['Alex', 'Bob', 'Cataline']
RollNo = [601, 602, 603]
listOfTuples = list(zip(Name, RollNo))
df = pd.DataFrame(listOfTuples, columns = ['Name', 'Roll No.'])
print(df)Output:
Name Roll No.
0 Alex 601
1 Bob 602
2 Cataline 6034. How to Read CSV Data in Pandas
A "comma-separated values" (CSV) file is a delimited text file that uses a comma to separate values. You can read a CSV file using the read_csv() method in pandas. If you want to print the entire DataFrame, use the to_string() method.
In this and the next examples, this CSV file will be used to perform the operations.
df = pd.read_csv('')
print(df.to_string()) Output:
5. How to Analyze DataFrames Using the head(), tail(), and info() Methods
How to View Data Using the head() Method
The head() method is one of the best ways to get a quick overview of the DataFrame. This method returns the header and specified number of rows, starting from the top.
df = pd.read_csv('')
print(df.head(10))Output:
If you don't specify the number of rows, the first 5 rows will be returned.
df = pd.read_csv('')
print(df.head())Output:
How to View Data Using the tail() Method
The tail() method returns the header and specified number of rows, starting from the bottom.
df = pd.read_csv('')
print(df.tail(10)) Output:
If you don't specify the number of rows, the last 5 rows will be returned.
df = pd.read_csv('')
print(df.tail())Output:
How to Get Info About the Data
The info() methods return a brief summary of a DataFrame including the index dtype and column dtypes, non-null values, and memory usage.
df = pd.read_csv('')
print(df.info())Output:
6. How to Read JSON Data in Pandas
JSON (JavaScript Object Notation) is a lightweight data-interchange format. You can read a JSON file using the read_json() method in pandas. If you want to print the entire DataFrame, use the to_string() method.
In the below example, this JSON file is used to perform the operations.
Related: What Is JSON? A Layman's Overview
df = pd.read_json('')
print(df.to_string())Output:
Refresh Your Python Knowledge With Inbuilt Functions and Methods
Functions help shorten your code and improve its efficiency. Functions and methods like reduce(), split(), enumerate(), eval(), round(), etc. can make your code robust and easy to understand. It's always good to know about built-in functions and methods as they can simplify your programming tasks to a great extent.