Reputation: 191
I had a csv
file that has youtube url and its timestamps.
https://www.youtube.com/watch?v=dsnLcaNhXd6o,0:13-0:20;0:25-0:31;0:36-0:40
https://www.youtube.com/watch?v=d8InLcaNhXd6o,0:43-0:52;0:56-1:07
https://www.youtube.com/watch?v=Inji8LcaNhXd6o,0:13-0:20;0:25-0:31;0:36-0:40;0:43-0:52;0:56-1:07;1:15-1:25;1:28-1:40
I need to convert the csv
file into a pydantic
object so that I can validate the csv file and pass it to perform certain process.
with open(csv_file, mode ='r') as file:
csvFile = csv.reader(file)
csvList = list(enumerate(csvFile))
I'm having the following Pydantic models:
class TimeStamp(BaseModel):
start_min: int
start_sec: int
end_min: int
end_sec: int
class VideoDetail(BaseModel):
row_index: int
url: str
timestamps: List[TimeStamp]
class VideoList(BaseModel):
entry: List[VideoDetail]
Now I need to pass the csvList
to VideoList
model and perform some validations and get a VideoList
object.
First, the list(enumerate(csvFile))
will return a list
of tuples
with row index
and row
example
:
csvList = list(enumerate(csvFile))
print(csvList)
output
:
[
(0, "https://www.youtube.com/watch?v=dsnLcaNhXd6o","0:13-0:20;0:25-0:31;0:36-0:40"),
(1, "https://www.youtube.com/watch?v=d8InLcaNhXd6o","0:43-0:52;0:56-1:07"),
(2, "https://www.youtube.com/watch?v=d8InLcaNhXd6o","0:43-0:52;0:56-1:07")
]
Now, when I pass the csvList
to VideoList
model, the timestamp
will be passed as a string. But how can I pass it into a list of TimeStamp
objects?
I tried to add a validator to the timestamp
field in the VideoDetail
model and split the string into a list of timestamps then return it. But it won't work as it will throw an error since, the type of the timestamp
does not match.
Upvotes: 0
Views: 1303
Reputation: 743
Basically the idea is that you will have to split the timestamp string into pieces to feed into the individual variables of the pydantic model : TimeStamp
I am using a validator function to do the same. The pre=True in validator ensures that this function is run before the values are assigned. In the validator function:-
(I have used dictionary instead of tuples. You can use tuples)
class TimeStamp(BaseModel):
start_min: int
start_sec: int
end_min: int
end_sec: int
class VideoDetail(BaseModel):
row_index: int
url: str
timestamps: List[TimeStamp]
@validator("timestamps", pre=True)
def createTimestamps(cls, value):
timestampslist = []
if isinstance(value, str):
timestamplist_str = value.split(";")
for eachTimestamp in timestamplist_str:
start_time_str, end_time_str = eachTimestamp.split("-")
t = TimeStamp(start_min=int(start_time_str.split(":")[0]),
start_sec = int(start_time_str.split(":")[1]),
end_min=int(end_time_str.split(":")[0]),
end_sec = int(end_time_str.split(":")[1]))
timestampslist.append(t)
return timestampslist
class VideoList(BaseModel):
entry: List[VideoDetail]
csvList = [
{"row_index":0, "url": "https://www.youtube.com/watch?v=dsnLcaNhXd6o", "timestamps":"0:13-0:20;0:25-0:31;0:36-0:40"},
{"row_index":0, "url": "https://www.youtube.com/watch?v=wcsnLcad6d", "timestamps":"0:13-0:20;0:25-0:31;0:36-0:40"},
{"row_index":0, "url": "https://www.youtube.com/watch?v=LcdshXe6o", "timestamps":"0:13-0:20;0:25-0:31;0:36-0:40"},
]
vs = VideoList(entry=csvList)
Upvotes: 0