Reputation: 136
I have five .csv's that have the same fields in the same order that need to be processed as such:
PT333
is in column1
for the file name data1
:column1 column2 column3
PT389 LA image.jpg
PT372 NY image2.jpg
output1
I tried to do this, but I don't know how to make it loop and do the same thing for each file.
At the moment it returns the number, but I also want it to return the data frame name, and I also want it to loop through all the files (a to e) to check for all the values in the numbers
list.
This is what I have:
import os
import glob
import pandas as pd
from glob import glob
from os.path import expanduser
home = expanduser("~")
os.chdir(home + f'/files/')
data = glob.glob('data*.csv')
data
# If you have tips on how to loop through these rather than
# have a line for each one, open to feedback
a = pd.read_csv(data[0], encoding='ISO-8859-1', error_bad_lines=False)
b = pd.read_csv(data[1], encoding='ISO-8859-1', error_bad_lines=False)
c = pd.read_csv(data[2], encoding='ISO-8859-1', error_bad_lines=False)
d = pd.read_csv(data[3], encoding='ISO-8859-1', error_bad_lines=False)
e = pd.read_csv(data[4], encoding='ISO-8859-1', error_bad_lines=False)
filenames = [a,b,c,d,e]
filelist= ['a','b','c','d','e']
# I am aware that this part is repetitive. Unsure how to fix this,
# I keep getting errors
# Any help appreciated
numbers = ['PT333', 'PT121', 'PT111', 'PT211', 'PT222']
def type():
for i in a.column1:
if i == numbers[0]:
print(numbers[0])
elif i == numbers[1]:
print(numbers[1])
elif i == numbers[2]:
print(numbers[2])
elif i == numbers[3]:
print(numbers[3])
elif i == numbers[4]:
print(numbers[4])
type()
Also happy to take any constructive criticism as to how to repeat less code and make things smoother. TIA
Upvotes: 2
Views: 299
Reputation: 13858
Give this a try
for file in glob.glob('data*.csv'): # loop through each file
df = pd.read_csv(file, # create the DataFrame of the file
encoding='ISO-8859-1',
error_bad_lines=False)
result = df.where( \ # Check where the DF contains these numbers
df.isin(numbers)) \
.melt()['value'] \ # melt the DF to be a series of 'value'
.dropna() \ # Remove any nans (non match)
.unique().tolist() # Return the unique values as a list.
if result: # If there are any results
print(file, ', '.join(result) # print the file name, and the results
Remove the comments and trailing spaces if you are copying and pasting the code. for the result
line, in case you run into SyntaxError
.
As mentioned you should be able to do the same without DataFrame as well:
for file in glob.glob('data*.csv'):
data = file.read()
for num in numbers:
if num in data:
print(file, num)
Upvotes: 1
Reputation: 1287
Also happy to take any constructive criticism as to how to repeat less code and make things smoother.
I hope you don't mind that i started with code restructure. it makes explaining the next steps easier
Using list builder allows us to iterate through the files and load them into an a list in 1 line. It also has a lot of memory and time benefits.
files = [pd.read_csv(entry, encoding='ISO-8859-1', error_bad_lines=False) for entry in data]
First we need an argument so that we can give call this function for any given file. Along with the list we can loop over it with a for each loop.
We use for each loops again here
for file in files:
type(file)
def type(file):
for value in file.column1:
if value in numbers:
print(value)
import os
import glob
import pandas as pd
from glob import glob
from os.path import expanduser
home = expanduser("~")
os.chdir(home + f'/files/')
#please note that i am use glob instead of glob.glob here.
data = glob('data*.csv')
files = [pd.read_csv(entry, encoding='ISO-8859-1', error_bad_lines=False) for entry in data]
numbers = ['PT333', 'PT121', 'PT111', 'PT211', 'PT222']
def type(file):
for value in file.column1:
if value in numbers:
print(value)
for file in files:
type(file)
Upvotes: 1
Reputation: 39
I would suggest changing the type function, and calling it slightly differently
def type(x):
for i in x.column1:
if i == numbers[0]:
print(i, numbers[0])
elif i == numbers[1]:
print(i, numbers[1])
elif i == numbers[2]:
print(i, numbers[2])
elif i == numbers[3]:
print(i, numbers[3])
elif i == numbers[4]:
print(i, numbers[4])
for j in filenames:
type(j)
Upvotes: 0