Reputation: 21
I have two text files. In one file, I have three columns with a '|' separated data. In another file, I have only one column. The example dataset is below:
File1
CO2|44|Carbondioxide
oxygen|16|02
sulfate|96|so4
File2
co2
sulfate
NO2
so4
o2
I want to create the 2nd column in file2. And I want to search each data of file2 in file1's column 1 and column3. . If matches then print respective column 2 value of file1 to respective rows in file2. If no matches are found, keep all the rows as it is in file2. The order of the rows should be the same as the original file2. So, my output will look like following:
column1 column2
co2 44
sulfate 96
NO2
SO4 96
O2 16
So, far I did following:
import pandas as pd
df1 = pd.read_csv ('file1.txt', sep='|', header=None)
df1.columns = ['pollutant1', 'mass', 'pollutant2']
df2 = pd.read_csv ('file2.txt', header=None)
df2.columns = ['pollutant']
df2["NewColumn"]= " "
I don't understand how to search file2 in file1 specific columns. Any help would be greatly appreciated.
Upvotes: 0
Views: 1011
Reputation: 151
I would iterate through each entry in column 1 (file1) first and compare it to each entry in file2.
import pandas as pd
df1 = pd.read_csv ('file1.txt', sep='|', header=None)
df1.columns = ['pollutant1', 'mass', 'pollutant2']
df2 = pd.read_csv ('file2.txt', header=None)
df2.columns = ['pollutant']
df2["NewColumn"]= " "
# iterate through each line in column 1 in file 1
for j, line in enumerate(df1["pollutant1"]):
# now we are going to compare each entry with each line in file 2
for x, item in enumerate (df2["pollutant"]):
# we check if both entry match
# therefore we convert both entries into lower case strings
if line.lower() == item.lower():
# if they match, we will add the specific entry to df2
df2["NewColumn"][x]=df1["mass"][j]
After that you can iterate through each entry in column 3 (file1) and compare it to each entry in file2.
# iterate through each line in column 3 in file 1
for j, line in enumerate(df1["pollutant2"]):
# now we are going to compare each entry with each line in file 2
for x, item in enumerate (df2["pollutant"]):
# we check if both entry match
# therefore we convert both entries into lower case strings
if line.lower() == item.lower():
# if they match, we will add the specific entry to df2
df2["NewColumn"][x]=df1["mass"][j]
print(df2)
However, there will be one problem with "Oxygen" and "o2". These entries cannot be matched.
Upvotes: 1