Reputation: 1071
This data is about file information in a specific folder which is expected to grow over time, meaning there will be many files with similar name pattern. But the filenames are not exactly the same. The code below captures the filename that matches a given pattern and also if there are multiple outputs, selects the latest one based on last_modified date. In this example that is filename1
Sample data frame:
d = {'file_name': ['finding_finding_april_040119_1012', 'finding_finding_april_040119_1111', 'question_answer_april_040119_0915', 'question_answer_april_040119_0945', 'review_rational_040119_0805'], 'No_of_records': [23, 32, 45, 42, 28 ], 'size_in_MB': [10, 15, 8, 12, 10 ], 'Last_modified': ['2019-04-01 05:00:15+00:00', '2019-04-01 05:00:20+00:00', '2019-04-01 07:00:15+00:00', '2019-04-01 07:15:15+00:00', '2019-04-01 05:00:15+00:00']}
import pandas as pd
df = pd.DataFrame(data = d)
df['Last_modified'] = pd.to_datetime(df['Last_modified'])
This is how the table looks like:
Code I am using:
mask1 = df['file_name'].str.contains("finding_finding_april")
df2 = df.loc[mask1]
mask2 = (df2['Last_modified'] == df2['Last_modified'].max())
df3 = df2.loc[mask2]
filename1 = df3.iloc[0,2]
The conditions mask1, mask2 can not be used together like mask1 & mask2. The code works as it is. But I think there should be a better way of writing this.
If I have a list of patterns like the following, how can I run a loop through the list to create filename1 ,filename2 without running the code separately for each of them.
list = ['finding_finding_april', 'question_answer_april', 'review_rational_april' ... ...]
Now I know how to run loop through a list and do something simple but not sure what to do in this situation.
Upvotes: 0
Views: 1209
Reputation: 118
you can iterate through the list and just create a list of filename, append the contents, just like the following
list = ['finding_finding_april', 'question_answer_april', 'review_rational_april']
for i in range(0,len(list)):
mask1 = df['file_name'].str.contains(list[i])
df2 = df.loc[mask1]
.
.
filename.append(df3.iloc[0,2])
Upvotes: 1