Reputation: 3
I want to calculate shannon's entropy on a ftir data. I have used two approaches and i get two different entropy value. the second code, where i have removed the backgroud using clustering and calcualted entropy. however, the entropy before removing Bg is still way too different. Please share your opinion and guidance.
import matplotlib.pyplot as plt
# Path to your .mat file
mat_file_path = 'structure1.mat'
# Load the .mat file
mat_data =
# Extract wavenumbers
wavenumbers = mat_data['wavenumbers'].squeeze()
# Extract intensity data
intensity_data = mat_data['spcImage']
# Select a specific position in the image to plot
position = (100, 100)
intensity_at_position = intensity_data[position[0], position[1], :]
# Plot the spectral data
plt.plot(wavenumbers, intensity_at_position)
plt.title('Spectral Data at Position (100, 100)')
# Calculate probability distribution
prob_distribution = average_spectra / np.sum(average_spectra)
# Calculate entropy
entropy = -np.sum(prob_distribution * np.log2(prob_distribution + 1e-10))
print("Shannon's Entropy:", entropy)
The spectral image is attached here. Result : Shannon's Entropy: 6.257635131501756
import os
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
# Define the cluster_k function
def cluster_k(p, k=2, init='k-means++', max_iter=300):
# Reshape 3D array to 2D for clustering
data_to_cluster = p.reshape(-1, p.shape[2]) if len(p.shape) == 3 else p
# Perform K-Means clustering
model = KMeans(n_clusters=k, init=init, n_init=10, max_iter=max_iter).fit(data_to_cluster)
# Obtain labels and reshape if original data was 3D
labels = model.labels_.reshape(p.shape[0], p.shape[1]) if len(p.shape) == 3 else model.labels_
return model, labels
# Define Shannon's entropy calculation function
def shannons_entropy(data):
data_flatten = data.flatten()
data_flatten_no_nan = data_flatten[~np.isnan(data_flatten)]
p_data = np.bincount(data_flatten_no_nan.astype(int)) / len(data_flatten_no_nan)
entropy = -np.sum(p_data * np.log2(p_data + 1e-10))
return entropy
# Define a function to extract patient number from file name
def extract_patient_number(filename):
# Extract the part before the first underscore and convert to float
return float(filename.split('_')[0])
# Path to the directory containing .mat files
mat_dir_path = 'filenames.mat'
# List all .mat files in the directory
mat_files = [f for f in os.listdir(mat_dir_path) if f.endswith('.mat')]
# List to store entropy values for each file
entropy_list = []
# Loop through each .mat file
for mat_file in mat_files:
# Load the .mat file
mat_data =, mat_file))
# Extract intensity data
intensity_data = mat_data['spcImage']
# Apply clustering to remove background
model, mask = cluster_k(intensity_data, k=2)
background_label = np.argmin(np.sum(model.cluster_centers_, axis=1))
signal_label = 1 - background_label
signal_mask = (mask == signal_label)
intensity_data_no_background = np.where(signal_mask[..., np.newaxis], intensity_data, np.nan)
# Calculate Shannon's entropy after removing the background
entropy = shannons_entropy(intensity_data_no_background)
# Extract patient number from the file name
patient_number = extract_patient_number(mat_file)
# Append a tuple with all three pieces of data
entropy_list.append((mat_file, patient_number, entropy))
except Exception as e:
print(f"An error occurred with file {mat_file}: {e}")
continue # Continue with the next file
# Convert list to DataFrame
entropy_df = pd.DataFrame(entropy_list, columns=['FileName', 'PatientNumber', 'Shannon_Entropy'])
# Save entropy values as CSV
csv_output_path = '/content/drive/Shareddrives/CHEMPREDICT-DL/Rahul_Code/entropy_values.csv'
entropy_df.to_csv(csv_output_path, index=False)
print(f"Processed {len(entropy_list)} files. Results are saved to {csv_output_path}")
Result: Result : Shannon's Entropy: 0.003 Thank you
I tried shannon's entropy on a spectral data. I was expecting a result between 0 and 1. The two different codes which I have tried give two different result. I am curious to know if this apporoach is right.
Upvotes: 0
Views: 33