Reputation: 47
There is a user defined list values = ['color','oilType','motor']
and a dataframe df having 15 columns including 'color' and 14 other column names. 'oilType' and 'motor' are not present in df. However on running the below snippet only 'motor' is getting outputted. The ideal output (missing columns in the df) should be 'oilType' and 'motor'
df = pd.read_csv('type.csv',sep=',',error_bad_lines=False)
targets=list(df.columns)
values = ['color','oilType','motor']
with open("source.txt", "w") as output:
output.write(str(values))
with open('source.txt','r') as source:
for source1 in values:
source1 = source1.strip().lower()
for target1 in targets:
if source1 not in target1:
with open("columns_missing.txt", "w") as output:
output.write(str(source1))
what should be changed so that both 'oilType' and 'motor' are found as missing from the df
Upvotes: 0
Views: 29
Reputation: 13407
Your for loops should be nested and are not. When you iterate over values
you're not doing anything with each value, you're simply ending on "motor". So by the time you start iterating on targets
your source1
variable is "motor" and doesn't change. Try indenting like this:
df = pd.read_csv('type.csv',sep=',',error_bad_lines=False)
targets=list(df.columns)
values = ['color','oilType','motor']
with open("source.txt", "w") as output:
output.write(str(values))
with open('source.txt','r') as source:
for source1 in values:
source1 = source1.strip().lower()
for target1 in targets:
if source1 not in target1:
with open("columns_missing.txt", "w") as output:
output.write(str(source1))
In general, a better way to approach what you're doing will be to use the difference
method of a set:
data = np.random.randint(50, size=20).reshape(5,4)
df = pd.DataFrame(data, columns=["A", "B", "C", "D"])
print(df)
A B C D
0 28 40 29 44
1 29 7 48 38
2 41 5 48 31
3 45 42 28 44
4 6 45 15 37
values = ["hello", "B", "world", "D"]
not_found = set(values).difference(df.columns)
print(not_found) # not_found == {"hello", "world"}
with open("columns_missing.txt", "w") as output:
for value in not_found:
output.write(value)
Upvotes: 1