Python search and replace with search result

Question

Hi there I have 2 excel files, where on one is a table with some names (col A, excel 1) and values and the other one contains a phylogenetic tree with all the data stored in 6 columns tab separated (excel 2).

Xlsx 1:

g__Mannheimia
g__Peptococcus
g__unspecific_Proteobacteria
g__Acinetobacter

Xlsx 2 (6 columns):

Row1:
d__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Enterobacterales f__Enterobacteriaceae g__Salmonella s__
Row2:
d__Bacteria p__Firmicutes c__Bacilli o__Lactobacillales f__Streptococcaceae g__Streptococcus s__
Row3:
d__Bacteria p__Actinobacteriota c__Actinobacteria o__Mycobacteriales f__Mycobacteriaceae g__Mycobacterium s__
Row4:
d__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Enterobacterales f__Enterobacteriaceae g__Klebsiella s__
Row5:
d__Bacteria p__Proteobacteria c__Gammaproteobacteria o__Pseudomonadales f__Moraxellaceae g__Acinetobacter s__

What I'm trying to do is, search each row of xlsx 2 for the names in xlsx 1, then if an exact match is found on any of the 6 columns, copy the whole row (6 columns in total) and replace the name (1 column, should also stay like that) in xlsx 1 with this data. This can be done by hand with the search and replace function, but because it's a huge amount of data I'm trying to find another way to do it.

I'm trying to solve this with the help of pandas and already tried some stuff. At the moment I'm stuck here, because I can't even find my matching data on the second xlsx.

import sys
import pandas as pd

analysis_data_df = pd.read_excel('Mund_Biome_Raw_Data.xlsx', sheet_name='Genus', header=None, usecols='A', skiprows=3)
analysis_data = analysis_data_df.values.tolist()
# print(List1)
source_data_df = pd.read_excel('170519_taxonomy_in_qiime.xlsx', sheet_name='97_otu_taxonomy', header=None)
source_data = source_data_df.values.tolist()
# print(list_2)
matching = [s for s in source_data if any(xs in s for xs in analysis_data)]
print(matching)

Any kind of help is much appreciated!

Python search and replace with search result

Answers (1)

Related Questions