metonia mari
metonia mari

Reputation: 1

converting smi file from zinc to dataframe

my project is to get some compounds and small molecules that relate a specific kinase ensyme from zinc and other databases. i have try several ways to download smiles from zincid or pubchem id in zinc database. none of them worked and now i want to use smi file and i don't know how to open it like a dataframe.

my code is

 with open(smiles_file_path, 'r') as f:
    smiles_list = f.readlines()

# Create a DataFrame from the list of SMILES strings
df = pd.DataFrame({'SMILES': smiles_list})

# Display the DataFrame
print(df)
but it doesn't show data frame properly!

Upvotes: 0

Views: 148

Answers (1)

rapelpy
rapelpy

Reputation: 1869

smi-files are csv-files with SMILES in the first column and an optional second column separated by whitespaces (blank or tab). Zinc uses the second column for the ID.

You can open the files directly with pandas.

import pandas as pd

df = pd.read_csv('substances.smi', sep='\s+') # \s+ is the regex for whitespaces

df.columns = ['SMILES', 'ID']

print(df)

                                    SMILES                ID
0    N[C@H](CCc1ccc(N(CCCl)CCCl)cc1)C(=O)O  ZINC000016090786
1  N[C@@H](CCCc1ccc(N(CCCl)CCCl)cc1)C(=O)O  ZINC000002033385
2   N[C@H](CCCc1ccc(N(CCCl)CCCl)cc1)C(=O)O  ZINC000001763088
3    N[C@@H](Cc1ccc(N(CCCl)CCCl)cc1)C(=O)O  ZINC000000001673
4     N[C@H](Cc1ccc(N(CCCl)CCCl)cc1)C(=O)O  ZINC000000001661
5       CCN(CC)c1ccc(CC[C@@H](N)C(=O)O)cc1  ZINC001951410564
6        CCN(CC)c1ccc(CC[C@H](N)C(=O)O)cc1  ZINC001951410565

Upvotes: 0

Related Questions