BNarendraEnlightenment: Python Material - Part - 34

__author__ = "Narendra Boyina"

""" Real-Time usage """

"""Concepts to Cover for pandas concept. 
    Properties of dataframe 
    Inspecting Data Types
    Direct Assignment
    Updating Using map or replace
    Changing the name of Index
    Display Options
    Grouping Data
    Updating Rows and Columns

"""


##### Creating Mini Project to import dataset #####

import pandas as pd

"""
###### Importing dataset ######

--> A simple way to store big data sets is to use CSV files (comma separated files).
--> CSV files contains plain text and is a well know format that 
    can be read by everyone including Pandas.
"""

"""" Properties of dataframe 
head(), tail(), shape   """

""" head(n): 
The df.head(n) method is used to view the first n rows of the DataFrame.
If you don't specify n, the default number of rows displayed is 5.

tail(n): 
The df.tail(n) method is similar to df.head(n) but for the end of the DataFrame.
It returns the last n rows.

shape :
The obj.shape attribute of a DataFrame returns a tuple representing the dimensionality of the DataFrame. 
The first element of the tuple is the number of rows, and the second is the number of columns.

"""
''' To get dataset from different sources like "Kaggle","Google Dataset" Search etc.'''

create_csv_file = pd.read_csv("clean_jobs.csv")
#print(create_csv_file)
#print(create_csv_file.head())
#print(create_csv_file.tail())
#print(create_csv_file.shape)
#print(create_csv_file.columns)  # Lists all the column names in the DataFrame

""" Inspecting Data Types:
 Each column in a DataFrame has a specific data type. Understanding these types is crucial for proper data manipulation."""
#print(create_csv_file.dtypes)  # Display the data types of each column

""" 
obj.loc: The obj.loc method is used for label-based indexing, meaning we can access
rows and columns using their labels (i.e., index names and column names)."""

# Selecting all rows and a specific column by label
titles = create_csv_file.loc[:, 'title']
#print(titles)

# Selecting a range of rows and multiple columns by labels
subset = create_csv_file.loc[1:10, ['title', 'company', 'location']]
#print(subset)

""" 
obj.iloc: While obj.loc uses labels for indexing, df.iloc allows for integer-based indexing.
You use df.iloc to access rows and columns by their integer positions,
which makes it useful when you need to access data by its position in the DataFrame.

"""

# Selecting a single row from the DataFrame
single_row = create_csv_file.iloc[0]
#print(single_row)     # oth index (first row) data will be printed

# Selecting a specific row and columns by integer indices
specific_data = create_csv_file.iloc[10, [1, 2, 3]]  # row at index 10 and columns at indices 1, 2, and 3
#print(specific_data)

# Slicing to get multiple rows and columns
multi_slice = create_csv_file.iloc[20:26, 0:5]  # Rows 20 to 25  and columns 0 to 4(26 and 5 is excluded)
#print(multi_slice)

""" 
obj.at: obj.at is designed to access a single value for a row/column label pair. 
It is very similar to df.loc for accessing scalar values but is optimized for 
faster access when you only need to get or set a single value in a DataFrame."""

# Access a specific single value using row label and column name
title_of_first_data = create_csv_file.at[0, 'title']
#print(title_of_first_data)

# Accessing data from a specific country
accessing_us_data = create_csv_file[create_csv_file['location'] == 'United States']
#print(accessing_us_data)

""" 
##### Updating Rows and Columns #######

df.drop: 
The .drop() method in pandas is used to remove rows or columns from a DataFrame.
Its primary purpose is to drop specified labels from rows or columns.

Parameters:

labels: The row or column labels to drop.

axis: Specifies whether the labels refer to rows (axis=0) or columns (axis=1). By default, it's 0 (rows).

index or columns: An alternative way to specify the labels to drop, instead of using the labels parameter. 
It is equivalent to specifying axis=0 (for index) or axis=1 (for columns).

inplace: If True, the operation is done in place, meaning it modifies the DataFrame directly and returns None.
If False or not specified, it returns a new DataFrame with the specified labels dropped. """

#print(create_csv_file.drop(labels='title',axis=1))

"""Direct Assignment:
Directly assign a value to a specific column or even a cell in a DataFrame."""

create_csv_file.at[0, 'location'] = 'Hyderabad'  # Changes the location of the first data to hyderabad
#print(create_csv_file.head(5))

create_csv_file['new_column'] = 'default value'  # Adds a new column with all entries set to 'default value'
#print(create_csv_file)

create_csv_file.drop(axis=1,labels='new_column',inplace=True)
#print(create_csv_file)

""" Updating Using map or replace:
You can update a column based on a mapping dictionary or replace values."""

create_csv_file['title'].map({'Data Analyst': 'Data Analysts', 'Senior Data Analyst': 'Senior Data Analysts'}) # Mapping existing values to new ones
#print(create_csv_file)

#create_csv_file['location'].replace('San Francisco, CA', 'United States', inplace=True) # Replacing specific values
#print(create_csv_file.head(10)) # chtgpt ask

"""Changing the name of Index"""

"""Pandas allows you to rename the index of a DataFrame or Series, which can help in making the index more 
informative or aligning it with new data requirements."""

# Renaming the Index of a DataFrame

#create_csv_file.index.names = ['job_id']  # Renames the index to 'movie_id'
#print(create_csv_file)

create_csv_file.rename(columns={'job': 'job_title', 'company': 'company_name'}, inplace=True)
#print(create_csv_file)

""" Display Options """

# Set maximum number of rows and columns to display
pd.set_option('display.max_rows', 7)
pd.set_option('display.max_columns', 5) # chatgpt
#print(create_csv_file)

# Reset Options
pd.reset_option('display')
#print(create_csv_file)

################ Grouping Data ###############

genre_groups = create_csv_file.groupby('location')  # Groups the data by the 'genre' column
print(genre_groups)

for location, group_data in genre_groups:
    print(f"Genre: {location}")
    print(group_data)
    print()
Author: Boyina Narendra
Supporting Author: M. Meera Sindhu
Request: If you find this information useful, please provide your valuable comments
BNarendraEnlightenment

Sunday, May 24, 2026

Python Material - Part - 34 - Numpy_read_data_part_2

No comments:

Post a Comment