singularity2047
singularity2047

Reputation: 1071

how to loop through a list and execute multiple filter condition in python

This data is about file information in a specific folder which is expected to grow over time, meaning there will be many files with similar name pattern. But the filenames are not exactly the same. The code below captures the filename that matches a given pattern and also if there are multiple outputs, selects the latest one based on last_modified date. In this example that is filename1

Sample data frame:

d = {'file_name': ['finding_finding_april_040119_1012', 'finding_finding_april_040119_1111', 'question_answer_april_040119_0915', 'question_answer_april_040119_0945', 'review_rational_040119_0805'], 'No_of_records': [23, 32, 45, 42, 28 ], 'size_in_MB': [10, 15, 8, 12, 10 ], 'Last_modified': ['2019-04-01 05:00:15+00:00', '2019-04-01 05:00:20+00:00', '2019-04-01 07:00:15+00:00', '2019-04-01 07:15:15+00:00', '2019-04-01 05:00:15+00:00']}
import pandas as pd
df = pd.DataFrame(data = d)
df['Last_modified'] = pd.to_datetime(df['Last_modified'])

This is how the table looks like:

enter image description here

Code I am using:

mask1 = df['file_name'].str.contains("finding_finding_april")
df2 = df.loc[mask1]
mask2 = (df2['Last_modified'] == df2['Last_modified'].max())
df3 = df2.loc[mask2]
filename1 = df3.iloc[0,2]

The conditions mask1, mask2 can not be used together like mask1 & mask2. The code works as it is. But I think there should be a better way of writing this.

  1. Is there a way to improve the code using nested for loop or list comprehension?
  2. If I have a list of patterns like the following, how can I run a loop through the list to create filename1 ,filename2 without running the code separately for each of them.

    list = ['finding_finding_april', 'question_answer_april', 'review_rational_april' ... ...]

Now I know how to run loop through a list and do something simple but not sure what to do in this situation.

Upvotes: 0

Views: 1209

Answers (1)

Jeffin Sam
Jeffin Sam

Reputation: 118

you can iterate through the list and just create a list of filename, append the contents, just like the following

list = ['finding_finding_april', 'question_answer_april', 'review_rational_april']
for i in range(0,len(list)):
    mask1 = df['file_name'].str.contains(list[i])
    df2 = df.loc[mask1]
    .
    .
    filename.append(df3.iloc[0,2])

Upvotes: 1

Related Questions