Reputation: 115
i'm not sure if that's the right place to ask this kind of questions, but I feel like the way i'm doing things now is 'dumb way' and there's room for improvement in my code.
I'm trying to build stock data website as my side project, and im using rust for backend. One microservice i'm writing is responsible for scraping data from web and then saving it in database. The result of web scraping is 2d vector where each row is responsible for one attribute of struct i'll later construct. Then I save rows to variables.
Then i use izip! macro from itertools to make iterate over all those attributes and create struct.
izip!(
publication_dates,
quarter_dates,
income_revenue,
...
)
.for_each(
|(
publication_date,
quarter_date,
income_revenue,
...
)| {
Financials {
ticker: self.ticker.to_owned(),
publication_date,
quarter_date,
...
},
})
My issue is the fact, that one data table can have more than 40 attributes, to saving data from just one page can be over 250 lines of code so i'd have total of 2000 lines just to store webscraped data, most of it repetitive (parsing rows to correct data types). I'm pretty sure that's not correct approach since any changes i'd like to make would have to be done in many places. One of my ideas to make it better was to create enum with desired types, then create vector of those enums like vec!([dataType::quarter_date, dataType::int32, dataType::int32 ...]) and iteratoe over both rows and new vector, and use match statement to use according function for data processing. That would get shorten rows allocation part a bit, but probably not by much. Do you have any advice? Any hint would be great help, i just need a direction that i can later explore by myself :-)
Upvotes: 0
Views: 196
Reputation: 1116
If you want to only reduce the code duplication, I would recommend using a macro for that. A simple example is this (playground):
macro_rules! create_financials {
($rows:ident, $($fun:ident > $column:ident),+) => {{
$(
let $column = $rows
.next()
.ok_or("None")?
.into_iter()
.flat_map($fun);
)+
itertools::izip!($($column,)+).map(
|($($column,)+)| {
Financials {
$($column,)+
}
}
).collect::<Vec<_>>()
}}
}
Note that I removed the .collect::<Vec<_>>()
part, it is not needed and allocates additional memory.
I also replaced the for_each
with map
to return a Vec
from the macro which could be used outside of the macro.
The macro can be used simply like this:
let financials: Vec<Financials> = create_financials!(
rows,
quarter_string_date_to_naive_date > quarter_date,
publish_date_string_to_naive_date > publication_date,
income_revenue > income_revenue
);
To remove the code duplication of parsing to the different data types, look if the data types implement FromStr
, From
or TryFrom
. Else you could define your own trait which does the conversion and which you can implement for each data type.
Upvotes: 1