Ihos
Ihos

Reputation: 21

Find data of a specific column of a data file to another file's specific columns and print all rows using python

I have two text files. In one file, I have three columns with a '|' separated data. In another file, I have only one column. The example dataset is below:

File1

CO2|44|Carbondioxide
oxygen|16|02
sulfate|96|so4

File2

co2
sulfate
NO2
so4
o2

I want to create the 2nd column in file2. And I want to search each data of file2 in file1's column 1 and column3. . If matches then print respective column 2 value of file1 to respective rows in file2. If no matches are found, keep all the rows as it is in file2. The order of the rows should be the same as the original file2. So, my output will look like following:

column1           column2
co2                44
sulfate            96
NO2
SO4                 96
O2                   16

So, far I did following:

  import pandas as pd
df1 = pd.read_csv ('file1.txt', sep='|', header=None)
df1.columns = ['pollutant1', 'mass', 'pollutant2']
df2 = pd.read_csv ('file2.txt', header=None)
df2.columns = ['pollutant']
df2["NewColumn"]= " "

I don't understand how to search file2 in file1 specific columns. Any help would be greatly appreciated.

Upvotes: 0

Views: 1011

Answers (1)

Carst3n
Carst3n

Reputation: 151

I would iterate through each entry in column 1 (file1) first and compare it to each entry in file2.

import pandas as pd

df1 = pd.read_csv ('file1.txt', sep='|', header=None)
df1.columns = ['pollutant1', 'mass', 'pollutant2']
df2 = pd.read_csv ('file2.txt', header=None)
df2.columns = ['pollutant']
df2["NewColumn"]= " "

# iterate through each line in column 1 in file 1
for j, line in enumerate(df1["pollutant1"]):
    # now we are going to compare each entry with each line in file 2
    for x, item in enumerate (df2["pollutant"]):
        # we check if both entry match
        # therefore we convert both entries into lower case strings
        if line.lower() == item.lower():
            # if they match, we will add the specific entry to df2
            df2["NewColumn"][x]=df1["mass"][j]

After that you can iterate through each entry in column 3 (file1) and compare it to each entry in file2.

# iterate through each line in column 3 in file 1
for j, line in enumerate(df1["pollutant2"]):
    # now we are going to compare each entry with each line in file 2
    for x, item in enumerate (df2["pollutant"]):
        # we check if both entry match
        # therefore we convert both entries into lower case strings
        if line.lower() == item.lower():
            # if they match, we will add the specific entry to df2
            df2["NewColumn"][x]=df1["mass"][j]

print(df2)

However, there will be one problem with "Oxygen" and "o2". These entries cannot be matched.

Upvotes: 1

Related Questions