BNarendraEnlightenment

Monday, June 1, 2026

Python material- Part-37 - GIL(Global Interpreter Lock)

__author__ = "Narendra Boyina"

GIL is a lock in Python that allows only one thread to execute Python bytecode at a time, ensuring thread safety and preventing race conditions.

Real-Time-Example:

Imagine a restaurant with 10 workers. Even though all workers are ready to take orders, there is only one billing counter. So only one worker can process a bill at a time. Similarly, in Python, even if multiple threads exist, only one thread can execute Python code at a time because of the GIL,

Boyina Narendra

I am Boyina Narendra from an agricultural family background. Working in a reputed Software company (Team Lead). Started this blog to share my knowledge. I have gathered knowledge from various websites and books, and I have consolidated all of it into this blog. Please check out my blog and share your valuable comments. Your feedback can be very encouraging. I give preference to 3 things in my life. Continuous learning --- applying --- teaching

Sunday, May 24, 2026

Python Material - Part - 36 - pandas

__author__ = "Narendra Boyina"


""" pandas Concepts 
1. Introduction of Pandas
2. Creating Series
    a. Creating a Series with integer values
    b. Creating a Series with float values
    c. Creating a Series of different data type values
    d. Creating a Series by providing different labels for the values  
3. Built-In_Functions
4. Aggregation Functions
5. DataFrame
    a. Creating a DataFrame using a nested list (list of lists)
        i. Accessing Single Column data
        ii. Accessing Multiple Column data
    b. Creating a DataFrame using a Dictionary
    c. Create a Custom Row Index to the dataframe
6. Accessing required data from Series & DataFrame
    a. Accessing data from series using Index (iloc)
    b. Accessing range of data from series using Index (iloc)
    c. Accessing data from series using label (loc)
    d. Accessing range of data from series using label (loc)
    c. Accessing data from DataFrame using Index (iloc)
    d. Accessing range of data from DataFrame using Index (iloc)
    e. Accessing data from DataFrame using label (loc)
    f. Accessing range of data from DataFrame using label (loc)
6. Handling Missing Data
    Apply below methods for series & DataFrame
    isnull(), notnull(), fillna(value), dropna() --> for series & DataFrame
7. Indexing, Slicing, and Filtering
8. Merging dataframe
9. Importing DataSet
"""
"""
################# Introduction of pandas ###################
--> Pandas is a powerful Python library for data manipulation and analysis. 
--> Pandas provides easy-to-use data structures and functions to work with structured data like tabular, time series, or matrix data
--> Pandas has Inbuilt-functions for  data analyzing, cleaning, and manipulating.

Pandas primarily provides two data structures:
    1) Series data structure
    2) DataFrame data structure
"""
"""
1. Series
A Series is a one-dimensional labeled array that can hold any datatype (integers, floats, strings, Python objects, etc.)
"""
################# Importing pandas & NumPy modules ###################
import pandas as pd
import numpy as np
# print(pd.__version__)  # to check pandas library version

# Creating a Series with integer values
series_of_int_values = pd.Series([1, 8, 3, 2, 5, 4, 6])
# print(series_of_int_values) # Each value in the Series is assigned an index (By default 0, 1, 2, ...) + dtype: int64
"""
output: 
0    1
1    8
2    3
3    2
4    5
5    4
6    6
dtype: int64

"""
# Creating a Series with float values
series_of_float_values = pd.Series([4.72, 22.5, 73.2, 74.5])
# print(series_of_float_values) # Each value in the Series is assigned an index (By default 0, 1, 2, ...) + dtype: float64

"""
0     4.72
1    22.50
2    73.20
3    74.50
dtype: float64
"""

# Creating a Series of different data type values
series_of_diff_val = pd.Series(["hello", 472, 22.5, "good morning", 732, "hello", 74.5])
# print(series_of_diff_val) # Each value in the Series is assigned an index (By default 0, 1, 2, ...) + dtype: object
"""
0           hello
1             472
2            22.5
3    good morning
4             732
5           hello
6            74.5
dtype: object
"""
"""
We can also gives label the index:
Creating a Series by providing different labels for the values
"""
data = pd.Series([4.72, 22.5, 73.2, 74.5], index=['a', 'b', 'c','d'])
# print(data) # Each value in the Series is assigned a label + dtype: float64

data = pd.Series([4.72, 22.5, 73.2, 74.5], index=['I', 'II', 'III','IV'])  # roman numbers
# print(data) # Each value in the Series is assigned a label + dtype: float64

################# Built-In_Functions of pandas ###################

""" describe(): Provides a quick summary of the data.
This method gives a statistical summary of the Series, including count,
mean, standard deviation, minimum, maximum, and quartile values."""

statistic_data = pd.Series([10, 3.24, 5, 7, 9.2, 17 ])  # Creating a Series
# Descriptive statistics
# print(statistic_data.describe())

"""mean(): Computes the mean of the data.
add all the numbers together and then divide the sum by the total number of values in the set. """
# Mean of the Series
# print(statistic_data.mean())

"""std(): Computes the standard deviation."""
# Standard deviation of the Series
# print(statistic_data.std())

"""min() and max(): Computes the minimum and maximum values."""
# Minimum and maximum values
# print(statistic_data.min())
# print(statistic_data.max())

################# Aggregation Functions ###################

"""sum(): Sums up the values"""
# Sum of the Series
# print(statistic_data.sum())

"""cumsum(): Cumulative sum is the running total of frequencies in a dataset, calculated by adding each frequency
to the sum of all previous frequencies."""
# Cumulative sum of the Series
statistic_data = pd.Series([10, 3.24, 5, 7, 9.2])  # Creating a Series
# print(statistic_data)
# print(statistic_data.cumsum())

"""aggregate(func): Aggregates using one or more operations. Example : 'sum', 'mean', 'std' """
# Aggregating using multiple operations
aggregated = statistic_data.aggregate(['sum', 'mean', 'std'])
# print(aggregated)


"""  ################ DataFrame ################
A DataFrame is a two-dimensional labeled data structure (like a table or a spreadsheet) with rows and columns """

# Creating a DataFrame using a nested list (list of lists)
rows = [
        ["Narendra", 34, "Rajahmandry"],
        ["Manjula", 25, "Hyderabad"],
        ["Srikanth", 27, "Vijaywada"]
       ]    # Note: rows data acts as values, & coulmns data acts as keys


dataframe = pd.DataFrame(rows, columns=["Name", "Age", "City"])
# print(dataframe, "\n")  # it will convert to Tabular_data &  bydefault it will provide index Ex: 0 1 2 ....etc

# Accessing Column data from DataFrame
# print(dataframe["Name"], "\n")  # Prints a single column data based on
# print(dataframe[["Name", "Age"]])  # Prints multiple columns data
# print(dataframe[["Name", "City"]])  # Prints multiple columns data

"""Creating a DataFrame using a Dictionary"""

dict_data = {"Name":["Sri", "Meera", "Divya"],"Emp_ID":[2160, 2175,2183], "Specialization":["B.com", "B.C.A", "M.Sc"]}
dataframe = pd.DataFrame(dict_data)
# print(dataframe, "\n") # it will convert to Tabular_data &  bydefault it will provide index Ex: 0 1 2 ....etc

""" Create a Custom Row Index to the dataframe"""
df_cust_index = pd.DataFrame(dict_data)  # by default index will be printed as 0,1,2,3,.. for the data\frame
df_cust_index = pd.DataFrame(dict_data, index=['I', 'II', 'III'])
df_cust_index = pd.DataFrame(dict_data, index=['a', 'b', 'c'])
df_cust_index = pd.DataFrame(dict_data, index=['nanditha', 'Raahi', 'ganesh'])

# print(df_cust_index)

# Creating a DataFrame from a Dictionary
# Define data
Original_data = {
    'Name': ['Narendra', 'Srikanth', 'Meera', 'Vinod', 'karthikesh', "Raahi", "Nanditha", "Venkat"],
    'Email': ['narendra@brtechnosolutions.com', 'srikanth@brtechnosolutions.com', 'Meera@brtechnosolutions.com',
              'vinod@brtechnosolutions.com', 'karthikesh@brtechnosolutions.com', "Raahi@brtechnosolutions.com",
              "Nanditha@brtechnosolutions.com", "Venkat@brtechnosolutions.com"],
    'Role': ['Founder', 'Growth Manager', 'Instructor','Course Designer', 'Placement Coordinator', "HR Executive","Office Administrator", "Digital Marketing Executive"],
    'Phone Number': ['1111111111','2222222222', '3333333333', '4444444444', '5555555555'
                     , '6666666666', '7777777777', '8888888888']
}

# Display full DataFrame without column truncation
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)  # Allows full-width content to show
pd.set_option('display.expand_frame_repr', False)  # Prevent wrapping

# Create DataFrame
stud_info_df = pd.DataFrame(Original_data, index =[101, 102, 103, 104, 105, 106, 107, 108])
# print(stud_info_df,"\n") # Display the complete DataFrame with all rows & coumns
#
# print(stud_info_df.head() ,"\n") # Display first 5 rows
# print(stud_info_df.tail(),"\n") # Display last 5 rows

""" ############## Accessing required data from Series & DataFrame ###################"""
"""Accessing data from series using Index (iloc[ ])"""

""""we can use iloc[ ] in 2 ways
 1. index-based accessing
 2. Label-based accessing  """

# Indexing by position
accessing_of_series = pd.Series([24, 2, np.nan, 4, np.nan, None, 18],["a","b","c","d","e","f","g"])
# print(accessing_of_series,"\n")
# print(accessing_of_series.iloc[0])  # First element (Accessing data from series using Index)
# print(accessing_of_series.iloc[-1])  # Last element (Accessing data from series using Index)
# print(accessing_of_series.iloc[1:4])  # Accessing range of data from series using Index (iloc)

"""loc[ ]: Label-based accessing"""
# print(accessing_of_series.loc["a"])  # First element (Accessing data from series using Label)
# print(accessing_of_series.loc["g"])  # Last element (Accessing data from series using Label)
# print(accessing_of_series.loc['b':'d'])  # Accessing range of data from series using labels

"""Accessing data from DataFrame using Index"""

input_df = pd.DataFrame({
                'Name': ['Narendra', 'Meera', "Srikanth"],
                'Age': [33, 20, 26],
                'Qualification': ['M.Tech', 'B.C.A', "B.Com"],
                'City': ['India', 'USA', 'India']
                 })  # by default it will provide the index 0,1, 2.....etc
# print(input_df,"\n")
# print(input_df.iloc[0])  # Accessing data from DataFrame using Index (iloc)
# print(input_df.iloc[0:2]) # Accessing range of data from DataFrame using Index (iloc)


"""Accessing data from DataFrame using label"""

input_df = pd.DataFrame({
                'Name': ['Narendra', 'Meera', "Srikanth"],
                'Age': [33, 20, 26],
                'Qualification': ['M.Tech', 'B.C.A', "B.Com"],
                'City': ['India', 'USA', 'India']
                 }, index =["A", "B", "C"])
# # print(input_df,"\n")
# print(input_df.loc["B"])    # Accessing data from DataFrame using label (loc)
# print(input_df.loc["A":"B"]) # Accessing range of data from DataFrame using label (loc)



"""s[s > n]: Filters and returns elements greater than n."""

# Create a Series
series_data = pd.Series([10, 25, 17, 40, 5, 33, 567, 229,33,55, 2, 4, 3])

# Filter elements greater than 20
num = 20
# filtered = series_data[series_data > num]
# print(filtered)
#

"""################ Merging DataFrame ################"""

"""In Pandas,the merge() function is used to combine two DataFrames based on a common column or index."""

# First DataFrame
df1 = pd.DataFrame({
    'ID': [1, 2, 3, 5],
    'Name': ['Sindhu', 'Anjela', 'Sangeeta', "raahi"]
})

# Second DataFrame
df2 = pd.DataFrame({
    'ID': [1, 2, 4, 3],
    'Score': [85, 90, 95,77]
})
# Merge based on ID
merged_df = pd.merge(df1, df2, on='ID', how='inner')
# on: Column name(s) to join on (must be in both)
# 'inner': Only matching rows (default)
# print(merged_df)

"""
################# Data Manipulation ###################
lambda(), map(), apply(), sort(values), drop(labels)

Lambda(): 
1)Lambda function is called a single-line-anonymous function.
2)We can use the lambda keyword to create small anonymous function.
3)Lambda forms can take any number of arguments but return just one value in the form of an expression. 
They cannot contain commands or multiple expressions.
x = ((a+b)*c/d)%e
Function definition is here
variable = lambda arguments: operation
Syntax:  variable = lambda arg1 [arg2,arg3,.....argn]:expression
"""
fac = lambda n: 1 if n==0 else n*fac(n-1)
# print(fac(5))

"""
apply():
apply() Similar to map, but more flexible. (It Can be used Data Frames as well, where as map is only for Series)
"""
data = pd.Series([64, 9, 25, 36])

# Applying a lambda function to series, to calculate square root (1/2 == 0.5) of all the numbers present in the series
func = lambda x: x ** 0.5
sqrt = data.apply(func)   #square root always gives float values
# print(sqrt)

# Create a DataFrame with multiple Series
data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
    }
num_data = pd.DataFrame(data)
# print(num_data)

# Define a function to sum two Series
def sum_series(x, y):
    return x + y

# Apply the function on multiple Series using apply()
result = num_data.apply(lambda row: sum_series(row['A'], row['B']), axis=1)
# print(result) # Print the result

"""
sort_values(): Sorts the Series.
"""

element_sort = pd.Series([17, 4, 3, 27, 9, 6])
# Sorting the Series
sorted_s = element_sort.sort_values()
# print(sorted_s)   # index  will be constant even after sorted

""" df.drop()
The concept of "drop" usually refers to removing rows or columns from a DataFrame using the .drop() method.

Syntax: df.drop(labels, axis, inplace=False)
Syntax Parameters:
labels: Name(s) of the row(s) or column(s) to drop.

axis:
0 or 'index' → drop rows
1 or 'columns' → drop columns

inplace:
False (default) → returns a new DataFrame
True → modifies the original DataFrame

inplace=True parameter is used to modify the original DataFrame directly without creating a new one. """

""" Drop a colum by index """
input_df = pd.DataFrame({
                'Name': ['Narendra', 'Meera', "Srikanth"],
                'Age': [33, 20, 26],
                'Qualification': ['M.Tech', 'B.C.A', "B.Com"],
                'City': ['India', 'USA', 'India']
                 })
# print("New Data frame: \n",input_df)
# dropped_df = input_df.drop('City', axis=1)  # Drops 'City' column
# print("Data frame after deleting column by label: \n",dropped_df)

""" Drop a row by index """
# dropped_df = input_df.drop(2, axis=0)  # Drops the row with index 0
# print("Data frame after deleting column by index: \n", dropped_df)

input_df = pd.DataFrame({
                'Name': ['Narendra', 'Meera', "Srikanth"],
                'Age': [33, 20, 26],
                'Qualification': ['M.Tech', 'B.C.A', "B.Com"],
                'City': ['India', 'USA', 'India']
                 })
# print(input_df)

""" Drop multiple columns from Data Frame """
dropped_df = input_df.drop(['Age', 'City'], axis=1)
# print(dropped_df)

""" ############## Handling Missing Data ###################
Applying methods ( isnull(), notnull(), fillna(value), dropna()) for series & DataFrame """

"""isnull(): Checks for missing values, return boolean values, in both series & DataFrame """
output_series = pd.Series([1, 3, None, 7, 9])
# print(output_series.isnull()) # Checking for missing values from series
"""
output: 
0    False
1    False
2     True
3    False
4    False
dtype: bool
"""
input_df = pd.DataFrame({
                'Name': ['Narendra', 'Meera', None],
                'Age': [33, None, 26],
                'Qualification': ['M.Tech', 'B.C.A', "B.Com"],
                'City': ['India', None, 'India']
                 })

output_df = input_df.isnull()  # Checking for missing values from DataFrame
# print(output_df)

""" notnull(): Opposite of isnull() -->Checking for non-null values in Series & DataFrame """
# print(output_series.notnull())  # Checking for non-null values in Series
# print(input_df.notnull())  # Checking for non-null values in DataFrame
"""
output:
0     True
1     True
2    False
3     True 
4     True
dtype: bool
"""
""" fillna(value): Fills missing values with a specified value in both series & DataFrame """
""" Fills missing values with a specified value in series"""
# Create a Series with missing values
input_series = pd.Series([1, 2, np.nan, 4, np.nan, None])  # nan means --> (NOT A NUMBER)
# print(input_series) # Print the Series
# Filling missing values with 9
filled_series = input_series.fillna(19) # none will be fill with required number Ex: 9
# print(filled_series)

""" Fills missing values with a specified value in DataFrame"""
input_df = pd.DataFrame({
                'Name': ['Narendra', 'Meera', None],
                'Age': [33, None, 26],
                'Qualification': ['M.Tech', 'B.C.A', "B.Com"],
                'City': ['India', None, 'India']
                 })

output_df = input_df.fillna(4)  # removing 9 from fata frame in akl the places with requiored
# print(output_df)

""" dropna(): Drops all rows that contain missing values."""
# Creating a Series with missing values
# input_series = pd.Series([1, 2, np.nan, 4, np.nan, None,18])  # nan means --> (NOT A NUMBER)
# print(input_series) # Print the input Series just to see the difference before and after
# dropped_missing = input_series.dropna()  # Dropping missing values temporarily removed from series
# print(dropped_missing)  #  Print the output Series after dropping missing values

input_df = pd.DataFrame({
                'Name': ['Narendra', 'Meera', None],
                'Age': [33, None, 26],
                'Qualification': ['M.Tech', 'B.C.A', "B.Com"],
                'City': ['India', None, 'India']
                 })
# print(input_df,"\n") # Print the input DataFrame just to see the difference before and after

dropped_missing = input_df.dropna()  # Dropping missing values temporarily removed from DataFrame
# print(dropped_missing)  #  Print the output DataFrame after dropping missing values

===========================================================================
                             Pandas_part_2
===========================================================================
__author__ = "Narendra Boyina"

""" Real-Time usage """


##### Creating Mini Project to import dataset #####

import pandas as pd

""" ###### Importing dataset ######
--> A simple way to store big data sets is to use CSV files (comma separated files).
--> CSV files contains plain text and is a well know format that can be read by everyone including Pandas.
"""

"""" Properties of dataframe 
head(), tail(), shape   """

""" head(n): 
The df.head(n) method is used to view the first n rows of the DataFrame.
If you don't specify "n", the default number of rows displayed is 5.

tail(n): 
The df.tail(n) method is similar to df.head(n) but for the end of the DataFrame.
It returns the last "n" rows.

shape :
The df.shape attribute of a DataFrame returns a tuple representing the dimensionality of the DataFrame. 
The first element of the tuple is the number of rows, and the second is the number of columns.

"""
''' To get dataset from different sources like "Kaggle","Google Dataset" Search etc.'''

create_csv_file = pd.read_csv("clean_jobs.csv")
pd.set_option('display.max_rows', 500)        # or None for all rows
pd.set_option('display.max_columns', 50)
# print(create_csv_file)  # we can consider "create_csv_file" as an Object or Dataframe
# print(create_csv_file.head())  # By default 5 rows will be displayed from the top of the file
# print(create_csv_file.head(7))  # 7 rows will be displayed from the top of the file
# print(create_csv_file.tail())   # By default 5 rows will be displayed from the bottom of the file
# print(create_csv_file.tail(8))  # 8 rows will be displayed from the bottom of the file
# print(create_csv_file.shape)  # will display no of rows  & no of columns
# print(create_csv_file.columns)  # Lists all the column names in the DataFrame

""" Inspecting Data Types:
 Each column in a DataFrame has a specific data type. 
 Understanding these types is crucial for proper data manipulation."""
# print(create_csv_file.dtypes)  # Display the data types of each column

""" 
obj.loc: The obj.loc method is used for label-based indexing, meaning we can access
rows and columns using their labels (i.e., index names and column names)."""

# Selecting specified no of  rows and a specific column by label
titles = create_csv_file.loc[2:7, 'location']
# print(titles)

# Selecting all rows and a specific column by label
titles = create_csv_file.loc[ : , 'title']
# print(titles)

# Selecting all rows and a specific column by label
titles = create_csv_file.loc[ : , 'company']
# print(titles)

# Selecting a range of rows and multiple columns by labels
subset = create_csv_file.loc[1:5, ['title', 'company','location']]
# print(subset)

""" 
df.iloc: While obj.loc uses labels for indexing, df.iloc allows for integer-based indexing.
You use df.iloc to access rows and columns by their integer positions,
which makes it useful when you need to access data by its position in the DataFrame.

"""

# Selecting a single row from the DataFrame
# single_row = create_csv_file.iloc[0]
# print(single_row)     # oth index (first row) data will be printed

# Selecting a specific row and columns by integer indices
# specific_data = create_csv_file.iloc[10, [1, 2, 3]]  # row at index 10 and columns at indices 1, 2, and 3
# print(specific_data)

# Slicing to get multiple rows and columns
multi_rows_data_acess = create_csv_file.iloc[20:26, 0:4]  # Rows 20 to 25  and columns 0 to 3 (26 and 4 is excluded)
# print(multi_rows_data_acess)

""" 
df.at: obj.at is designed to access a single value for a row/column label pair. 
It is very similar to df.loc for accessing scalar values but is optimized for 
faster access when you only need to get or set a single value in a DataFrame."""
# Access a specific single value using "row index" and "column name"
title_of_first_data = create_csv_file.at[0, 'title']
# print(title_of_first_data)

# Access a specific single value using "row index" and "column name"
title_based_on_index = create_csv_file.at[14, 'title']
# print(title_based_on_index)

# Accessing data from a specific country
accessing_us_data = create_csv_file[create_csv_file['location'] == 'United States']
# print(accessing_us_data)

# Accessing data from a specific company
accessing_us_data = create_csv_file[create_csv_file['company'] == 'Meta']
#print(accessing_us_data)

""" 
##### Updating Rows and Columns #######

df.drop: 
The .drop() method in pandas is used to remove rows or columns from a DataFrame.
Its primary purpose is to drop specified labels from rows or columns.

Parameters:

labels: The row or column labels to drop.

axis: Specifies whether the labels refer to rows (axis=0) or columns (axis=1). By default, it's 0 (rows).

index or columns: An alternative way to specify the labels to drop, instead of using the labels parameter. 
It is equivalent to specifying axis=0 (for index) or axis=1 (for columns).

inplace: If True, the operation is done in place, meaning it modifies the DataFrame directly and returns None.
If False or not specified, it returns a new DataFrame with the specified labels dropped. """
# print(create_csv_file.columns)  # Lists all the column names in the DataFrame
# New_csv = create_csv_file.drop(labels='title',axis=1,inplace=False)
# print(New_csv.columns)  # Lists all the column names in the DataFrame

"""Direct Assignment:
Directly assign a value to a specific column or even a cell in a DataFrame."""

create_csv_file.at[0, 'location'] = 'Hyderabad'  # Changes the location of the first data to hyderabad
#print(create_csv_file.head(5))

create_csv_file['new_column'] = 'default value'  # Adds a new column with all entries set to 'default value'
#print(create_csv_file)

create_csv_file.drop(axis=1,labels='new_column',inplace=True)
#print(create_csv_file)

""" Updating Using map or replace:
You can update a column based on a mapping dictionary or replace values."""

create_csv_file['title'].map({'Data Analyst': 'Data Analysts', 'Senior Data Analyst': 'Senior Data Analysts'}) # Mapping existing values to new ones
#print(create_csv_file)

#create_csv_file['location'].replace('San Francisco, CA', 'United States', inplace=True) # Replacing specific values
#print(create_csv_file.head(10)) # chtgpt ask

"""Changing the name of Index"""

"""Pandas allows you to rename the index of a DataFrame or Series, which can help in making the index more 
informative or aligning it with new data requirements."""

# Renaming the Index of a DataFrame

#create_csv_file.index.names = ['job_id']  # Renames the index to 'movie_id'
#print(create_csv_file)

""" inplace=True """

""" In pandas, the inplace=True parameter is used in methods to modify the original DataFrame or Series directly,without creating a new object.
When inplace=True, the operation is performed on the same object, and no new object is returned. 
This can save memory but requires caution as the original data is altered."""

create_csv_file.rename(columns={'job': 'job_title', 'company': 'company_name'}, inplace=True)
#print(create_csv_file)

""" Display Options """
#You can use pd.set_option() to modify how data is displayed.

""" Set maximum number of rows and columns to display """

# pd.set_option('display.max_rows', 7)
# pd.set_option('display.max_columns', 5)
#print(create_csv_file)

""" Reset Options """
pd.reset_option('display')
#print(create_csv_file)

################ Grouping Data ###############

genre_groups = create_csv_file.groupby('location')  # Groups the data by the 'genre' column
# print(genre_groups)

# for location, group_data in genre_groups:
#     print(f"Genre: {location}")
#     print(group_data)
#     print()
Author: Boyina Narendra
Supporting Author: M. Meera Sindhu
Request: If you find this information useful, please provide your valuable comments

Boyina Narendra

Python Material - Part - 35 - Numpy_part_3

__author__ = "Narendra Boyina"

# -----------------------------------------------------------------------------
# Copyright (c) 2025 BR Technologies PVT LTD
# -----------------------------------------------------------------------------

"""Concepts to be Covered on numpy concept. 
 7. Initialization of arrays
        np.arange(), np.zeros(), np.ones() --> 2 dimensions & 3 dimensions
        np.full(), np.eye()
 8. Array Manipulations
        np.resize(), np.reshape(),ravel() vs flatten(), np.matmul(), np.transpose(), 
 9. Functions
        a. Aggregate Functions
        b. Broadcast  
        c. Exponential and Logarithmic Functions
10. Arrays Splitting and Joining 
        a. np.split()  vs np.array_split()
        b. np.hstack() Vs  np.vstack()
11. Array Adding and Removing Elements
        a. np.append(), np.insert(). np.delete()
12. Pseudo-random Number Generation
        a. np.random.randint()
        b. np.random.normal()
        c. np.random.rand()

"""

import numpy as np

""" # np.arange(): Used to create an array with regularly spaced values within a specified range """
# Create an array from 0 to 14
range_of_elements = np.arange(15)
# print(range_of_elements)  # Output: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
range_of_element = np.arange(0,100,20, dtype = int)
# print(range_of_element) # print the 0 to 100 values with difference of 20.
range_of_element = np.arange(0,100,20, dtype = float)
# print(range_of_element) # print the 0 to 100 values with difference of 20.

""" # np.zeros()   &  np.ones()
--> np.zeros creates an array filled with zeros &  np.ones creates an array filled with ones. 
--> It takes the shape of the desired array as input and returns an array of that shape filled with zeros 
--> by default it will fill with float values """

# Create a 2x3 array filled with zeros
zeros_array = np.zeros([2, 3])
# print(zeros_array)
""" Output: 2x3 array--> 2 rows and 3 colums
[[0. 0. 0.]
 [0. 0. 0.]]
"""
# Create a 2x3 array filled with ones
ones_array = np.ones([2, 3])
# print(ones_array)


# Create a 3D array filled with zeros
arr_3d = np.zeros([2, 3, 4])  # 2-Blocks, 3-rows, 4-colums
# print(arr_3d)

# Create a 3D array filled with ones
arr_3d = np.ones([2, 3, 4])  # 2-Blocks, 3-rows, 4-colums
# print(arr_3d)


# Print the number of dimensions
# print("Number of dimensions:", arr_3d.ndim)  # Output: 3


"""# np.full()
--> np.full creates an array filled with a specified constant value.
--> It takes the shape of the desired array and the constant value as input, and returns an array of that shape filled with the specified value.
Syntax: numpy.full(shape, fill_value, dtype=None)"""

# Create a 2x2 array filled with 5
full_array = np.full((2, 2), 5)   #(2,2) dimension , dimension is filled with 2.1
# print(full_array)


""" # np.eye():
--> used to create a 2-D array with "ones on the diagonal" and zeros elsewhere.
--> useful for creating "identity matrices" or matrices with specific diagonal patterns."""
# Create a 3x3 identity matrix -->  is a square matrix with 1s on the diagonal and 0s elsewhere.
matrix = np.eye(3)
# print(matrix)


""" ######## Array Manipulations ########"""

""" # np.resize() : changes the shape and size of an array in-place."""

# array = np.arange(10)  # One_d_array creation
# print("Created new array:",array) # [0 1 2 3 4 5 6 7 8 9]
# array.resize(2,3)
# print("Resized with 2X3 :\n", array )

# array = np.arange(10)  # One_d_array creation
# print("Created new array:",array) # [0 1 2 3 4 5 6 7 8 9]
# array.resize(2,5)
# print("Resized with 2X5 :\n", array )

# array = np.arange(10)  # One_d_array creation
# print("Created new array:",array) # [0 1 2 3 4 5 6 7 8 9]
# array.resize(2,7)
# print("Resized with 2X7 :\n", array )

""" # np.reshape():
--> Reshaping an array means changing the shape of the array without changing its data.
--> It's useful for converting arrays between different dimensions or rearranging their layout """

# Create a one-dimensional array of 12 elements
Created_array = np.arange(12)
# print("Original array: ", Created_array)

# Reshape it to a 3x4 two-dimensional array
Reshaped_array = Created_array.reshape(3,4)
# print("Reshaped array:\n", Reshaped_array)

""" Difference  between  ravel() and flatten()  in numpy
Both ravel() and flatten() in NumPy are used to convert a multi-dimensional array into a 1D array, 
but there's a key difference in how they handle memory ---> like shallow copy(ravel) & deepcopy(flatten) 

Memory usage 
--> of ravel is More memory-efficient   
--> of flatten is Uses more memory (new object)"""


""" #np.ravel()
--> The ravel() method convert Multi-dimensional array to an one-dimensional array.
--> it uses same memory location of the original array like shallow copy"""

raveled_array = Reshaped_array.ravel() # Flatten the 3x4 array to a one-dimensional array
# print("raveled array:", raveled_array)

""" # np.flatten()
--> Similar to ravel(), flatten() convert Multi-dimensional array to an one-dimensional array.
--> It creates a separate memory location for the new array,
--> Due to different memory location, if we do any modification, it will not affecting the original array """

# print("Reshaped array:\n", Reshaped_array)
# flatten_array = Reshaped_array.flatten()
# print("Flattened array:", flatten_array)

""" Example  to show the Difference between ravel and flatten arrays """
"""Below we are going to explain ravel() related to memory --> shallow copy"""
# Creating a 2D array
created_2d_array = np.array([[1, 2], [3, 4]])
# print("created_2d_array:\n",created_2d_array)

raveled_array = created_2d_array.ravel()  # converted to 1-D array using ravel()
# print("raveled_array : ", raveled_array)

raveled_array[0] = 100  # Modifying the raveled array
# print("Priniting Original array after modifying raveled array: \n", created_2d_array)   # if you observe original array modified

"""Below we are going to explain flatten() related to memory --> Deep copy"""
# Creating a 2D array
created_2d_array = np.array([[1, 2], [3, 4]])
# print("created_2d_array:\n",created_2d_array)

flatten_array = created_2d_array.flatten()  # converted to 1-D array using ravel()
# print("flatten_array : ", flatten_array)

flatten_array[0] = 150  # Modifying the flatten_array
# print("Priniting Original array after modifying flatten array: \n", created_2d_array)   # if you observe original array will not be modified

"""  # Matrix Multiplication (np.matmul()):
--> Matrix multiplication is a fundamental operation in linear algebra, where you multiply two matrices to obtain a new matrix.
--> In NumPy, you can perform matrix multiplication using the np.matmul() function. """

# Define matrices
matrix_mul_a = np.array([[1, 2], [3, 4]])
matrix_mul_b = np.array([[5, 6], [7, 8]])

# Matrix multiplication using np.matmul()
result = np.matmul(matrix_mul_a, matrix_mul_b)
# print("Matrix Multiplication:\n",result)

""" # Matrix Transpose (np.transpose()):
--> "Transposing a matrix" means flipping its rows with its columns.
--> In NumPy, you can obtain the transpose of a matrix using the np.transpose() function 
--> In NumPy, you can transpose an array using the T attribute """

matrix_trans = np.array([[1,2,3],
                              [4,5,6]])
transposed_matrix = np.transpose(matrix_trans)
# print(transposed_matrix)

# Create a 2D array
arr_2d = np.array([[1, 2, 3],
                    [4, 5, 6]])

# Transpose the array
transposed_arr = arr_2d.T
# print("Transposed array: \n ", transposed_arr)


"""Aggregate Functions:
--> Aggregate functions in NumPy are functions that operate on arrays and return a single value, summarizing the data in some way.
--> Common aggregate functions include np.sum(), np.max(), np.min(), np.mean(), etc. """
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])

#print("Sum of all elements:", np.sum(matrix))  # Output: 21
#print("Maximum element:", np.max(matrix))  # Output: 6
#print("Minimum element:", np.min(matrix))  # Output: 1
#print("Mean of all elements(sum/total):", np.mean(matrix))  # Output: 3.5

""" Exponential and Logarithmic Functions """

math_data = np.array([1,2,3,4])
# Exponential
result_exp = np.exp(math_data)
# print("exponential :", result_exp)

# Natural logarithm
result_log = np.log(math_data)
# print("Logarithm", result_log)

""" Broadcasting:
--> NumPy automatically broadcasts arrays to perform element-wise operations
--> Universal functions also support broadcasting, which means they can operate on arrays of different shapes """

broad_arr =np.array([[1,3,2], [4,5,6]])

# Element-wise addition with scalar
result_broad = broad_arr + 2
# print("Broadcasting with Scalar:",result_broad)


"""##### Arrays Splitting and Joining #####"""
"""
--> Splitting allows you to divide large arrays into smaller arrays.
--> This can be useful for parallel processing tasks or during situations
where subsets of data need to be analyzed separately """

""" ######   np.split()  vs np.array_split()    ###### """

""" np.split() 
    --> Strict splitting: Only works if the array can be divided exactly into equal parts.
    --> Raises an error if the split does not divide the array evenly
    --> Syntax: np.split(array, indices_or_sections, axis=0) """

creating_1D_array = np.arange(9)
# print("Orginal aray:", creating_1D_array)

# splited_array = np.split(creating_1D_array,3)  # Split the array into 3 equal parts
# print("splited_array:", splited_array)

#splited_array = np.split(creating_1D_array,4)  # ValueError: array split does not result in an equal division
# print("splited_array:", splited_array)

"""np.array_split()
    --> Flexible splitting: Allows unequal splits.
    --> Will divide the array as evenly as possible and does not raise an error if the array can't be divided equally.
    --> Syntax: np.array_split(array, indices_or_sections, axis=0) """

# Split the array into 4 parts, which will not be equal
# array_split = np.array_split(creating_1D_array, 4)
# print("Array split into unequal parts:", array_split)


""" ####  np.hstack() Vs  np.vstack() #### """

"""np.hstack() — Horizontal Stack
    --> Joins arrays along columns (axis=1 for 2D arrays).
    --> Think: side-by-side stacking.
    --> Syntax: np.hstack((array1, array2, ...)) """

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

c = np.hstack((a, b))
# print(c)
# Output:
# [[1 2 5 6]
#  [3 4 7 8]]

"""np.vstack() — Vertical Stack
    --> Joins arrays along rows (axis=0 for 2D arrays).
    --> Think: top-to-bottom stacking.  
    --> Syntax: np.vstack((array1, array2, ...)) """
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

c = np.vstack((a, b))
# print(c)
# Output:
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

####### Array Adding and Removing Elements #########
""" np.append: Adds elements to the end of an array """
array = np.array([101, 102, 103])
appended_aaray =np.append(array, [104, 105])
# print("Appended array: ", appended_aaray)

""" np.insert: Inserts elements at a specific position in the array """
array_after_inserted = np.insert(appended_aaray, 1, [2, 3, 4,])
# print("Array with inserted elements:", array_after_inserted)

""" np.delete: Removes elements at a specific position from the array """
# Create a one-dimensional array
creating_1D_array = np.arange(5,15)
# print(creating_1D_array)
# Delete the element at index 2
result = np.delete(creating_1D_array, 2)
# print("Array after deleting element at index 2:", result)

creating_1D_array = np.arange(5,15)
# print(creating_1D_array)
# Delete multiple elements
# result = np.delete(creating_1D_array, [0, 3])  # removes only particular indices
# print("Array after deleting elements at indices 0 and 3:", result)

# Create a two-dimensional array
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Delete the second row
result = np.delete(array_2d, 1, axis=0)
# print("Array after deleting second row:\n", result)


array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Delete the third column
result = np.delete(array_2d, 2, axis=1)  # deleted based on axis
# print("Array after deleting third column:\n", result)

""" Pseudo-random Number Generation:
--> NumPy provides various functions for generating pseudo-random numbers.
--> These functions are located in the numpy.random module. 
--> You can generate random numbers from different distributions, such as uniform, normal, binomial, etc. """

""" # np.random.randint(): Generates required number of random integers between  given range of values (1D / 2D arrays)"""
# Pseudo-random Number Generation in 1D Array: # Generate 5 random integers between 1 and 10
random_integers = np.random.randint(1, 10, size=5)   # generates 1D array with 5 random values
# print("Random integers (1D):", random_integers)

# Generate a 2D array of shape (3, 4) with random integers between 1 and 10
random_integers_2d = np.random.randint(1, 10, size=(3, 4))
# print("Random integers (2D):\n ", random_integers_2d)

""" # np.random.normal(): Generates required number of "positive & Negative float values" in  (1D / 2D arrays)"""
# Generate 5 random numbers from a normal distribution,
random_normal = np.random.normal(size=5)
# print("Random numbers from normal distribution (1D):", random_normal) # by default provides +ve/-Ve float values

# Generate a 2D array of shape (3, 3) with random numbers from a normal distribution
random_normal_2d = np.random.normal(size=(3, 3))
# print("Random numbers from normal distribution (2D):\n", random_normal_2d)  # by default provides +ve/-Ve float values


""" np.random.rand(r, c) --> # Everytime generate only Random +ve float values in a given shape(r X c)--> r rows, c colums"""
random_of_values = np.random.rand(2,3) # generate Random values in a given shape(2X3)--> 2 rows ,3 colums
# print(random_of_values) # Bydefault It will give the random(float) values

Author: Boyina Narendra
Supporting Author: M. Meera Sindhu
Request: If you find this information useful, please provide your valuable comments

Boyina Narendra