Reputation: 3
I want to calculate shannon's entropy on a ftir data. I have used two approaches and i get two different entropy value. the second code, where i have removed the backgroud using clustering and calcualted entropy. however, the entropy before removing Bg is still way too different. Please share your opinion and guidance.
import scipy.io
import matplotlib.pyplot as plt
# Path to your .mat file
mat_file_path = 'structure1.mat'
# Load the .mat file
mat_data = scipy.io.loadmat(mat_file_path)
# Extract wavenumbers
wavenumbers = mat_data['wavenumbers'].squeeze()
# Extract intensity data
intensity_data = mat_data['spcImage']
# Select a specific position in the image to plot
position = (100, 100)
intensity_at_position = intensity_data[position[0], position[1], :]
# Plot the spectral data
plt.plot(wavenumbers, intensity_at_position)
plt.xlabel('Wavenumber')
plt.ylabel('Intensity')
plt.title('Spectral Data at Position (100, 100)')
plt.show()
# Calculate probability distribution
prob_distribution = average_spectra / np.sum(average_spectra)
# Calculate entropy
entropy = -np.sum(prob_distribution * np.log2(prob_distribution + 1e-10))
print("Shannon's Entropy:", entropy)
The spectral image is attached here. Result : Shannon's Entropy: 6.257635131501756
import os
import scipy.io
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
# Define the cluster_k function
def cluster_k(p, k=2, init='k-means++', max_iter=300):
# Reshape 3D array to 2D for clustering
data_to_cluster = p.reshape(-1, p.shape[2]) if len(p.shape) == 3 else p
# Perform K-Means clustering
model = KMeans(n_clusters=k, init=init, n_init=10, max_iter=max_iter).fit(data_to_cluster)
# Obtain labels and reshape if original data was 3D
labels = model.labels_.reshape(p.shape[0], p.shape[1]) if len(p.shape) == 3 else model.labels_
return model, labels
# Define Shannon's entropy calculation function
def shannons_entropy(data):
data_flatten = data.flatten()
data_flatten_no_nan = data_flatten[~np.isnan(data_flatten)]
p_data = np.bincount(data_flatten_no_nan.astype(int)) / len(data_flatten_no_nan)
entropy = -np.sum(p_data * np.log2(p_data + 1e-10))
return entropy
# Define a function to extract patient number from file name
def extract_patient_number(filename):
# Extract the part before the first underscore and convert to float
return float(filename.split('_')[0])
# Path to the directory containing .mat files
mat_dir_path = 'filenames.mat'
# List all .mat files in the directory
mat_files = [f for f in os.listdir(mat_dir_path) if f.endswith('.mat')]
# List to store entropy values for each file
entropy_list = []
# Loop through each .mat file
for mat_file in mat_files:
try:
# Load the .mat file
mat_data = scipy.io.loadmat(os.path.join(mat_dir_path, mat_file))
# Extract intensity data
intensity_data = mat_data['spcImage']
# Apply clustering to remove background
model, mask = cluster_k(intensity_data, k=2)
background_label = np.argmin(np.sum(model.cluster_centers_, axis=1))
signal_label = 1 - background_label
signal_mask = (mask == signal_label)
intensity_data_no_background = np.where(signal_mask[..., np.newaxis], intensity_data, np.nan)
# Calculate Shannon's entropy after removing the background
entropy = shannons_entropy(intensity_data_no_background)
# Extract patient number from the file name
patient_number = extract_patient_number(mat_file)
# Append a tuple with all three pieces of data
entropy_list.append((mat_file, patient_number, entropy))
except Exception as e:
print(f"An error occurred with file {mat_file}: {e}")
continue # Continue with the next file
# Convert list to DataFrame
entropy_df = pd.DataFrame(entropy_list, columns=['FileName', 'PatientNumber', 'Shannon_Entropy'])
# Save entropy values as CSV
csv_output_path = '/content/drive/Shareddrives/CHEMPREDICT-DL/Rahul_Code/entropy_values.csv'
entropy_df.to_csv(csv_output_path, index=False)
print(f"Processed {len(entropy_list)} files. Results are saved to {csv_output_path}")
Result: Result : Shannon's Entropy: 0.003 Thank you
I tried shannon's entropy on a spectral data. I was expecting a result between 0 and 1. The two different codes which I have tried give two different result. I am curious to know if this apporoach is right.
Upvotes: 0
Views: 33