Reputation: 7736
update:
Thanks for helping out. Actually I used the CSV parser to get what I want but I ask just because I want to know how the inner part of CSV parser works.
It's a part from Google Analytics CSV report. Actually I have found many other libs to retrieve what I want but I just really want to know what is the best way to get the data I want from this particular case. Though at first it looks not that hard, it's getting my crazy...
The data looks like this as a string:
/page1/index.php,"795,852","620,499",00:03:25,"33,416",10.82%,66.43%,$0.00
The string /page1/index.php
is a page's name.
The first number "795,852"
is the page view
The second number "620,499"
is the unique page view
then with the avg. duration time.
Then I want to parse it to an object as:
{
page: "/page1/index.php"
pv: 795852
uv: 620499
avg_time:"00:03:25"
}
For some reasons, I only need to keep the first four data from this string. When I try to use a simple JavaScript code to parse, everything works fine until I found something different when the "pageviews" data are small.
For instance, sometimes it looks like:
/page2/index.php,"795,852",620,00:03:25,"33,416",10.82%,66.43%,$0.00
Or:
/page3/index.php,852,"620,499",00:03:25,"33,416",10.82%,66.43%,$0.00
Or:
/page4/index.php,852,620,00:03:25,"33,416",10.82%,66.43%,$0.00
The rule is: when the number is bigger than a thousand, it is written as
"795,852"
But when the number is smaller, it's just
852
There is no ""
with it and of course, no ,
as the splitter. This makes it very hard to use just Regular Expression to get the data.
This makes it very difficult to parse the string into a wanted object, something like:
{
page: "/page1/index.php"
pv: 795852
uv: 620499
avg_time:"00:03:25"
}
any good ideas on parsing this with JavaScript?
Upvotes: 2
Views: 87
Reputation: 8494
How about:
var data = [
'/page1/index.php,"795,852","620,499",00:03:25,"33,416",10.82%,66.43%,$0.00',
'/page2/index.php,"795,852",620,00:03:25,"33,416",10.82%,66.43%,$0.00',
'/page3/index.php,852,"620,499",00:03:25,"33,416",10.82%,66.43%,$0.00',
'/page4/index.php,852,620,00:03:25,"33,416",10.82%,66.43%,$0.00'
];
data.map(function (item) {
return item.replace(/"(\d+),(\d+)"/g, '$1$2');
}).map(function (item) {
var a = item.split(',');
return {
page: a[0],
pv: parseInt(a[1]),
uv: parseInt(a[2]),
avg_time: a[3]
};
});
Which results in:
[
{
"page": "/page1/index.php",
"pv": 795852,
"uv": 620499,
"avg_time": "00:03:25"
},
{
"page": "/page2/index.php",
"pv": 795852,
"uv": 620,
"avg_time": "00:03:25"
},
{
"page": "/page3/index.php",
"pv": 852,
"uv": 620499,
"avg_time": "00:03:25"
},
{
"page": "/page4/index.php",
"pv": 852,
"uv": 620,
"avg_time": "00:03:25"
}
]
What's wrong with this?
,
in the numbers is weakBut...
Upvotes: 0
Reputation: 49803
I agree with the arguments against using regex for such problems, in general, and it would probably be easier to use a proper parser; however, in this case, I think a regex will work:
^([^,]+),(("[^"]+")|([^,]+)),(("[^"]+")|([^,]+)),([^,]+),
That is:
"
, get everything up to the next "
; otherwise, get everything up to the next commaUpvotes: 1
Reputation: 20096
Use a csv parser, not Regex. Try something like this: https://www.npmjs.com/package/csv
Regex is not a suitable tool for parsing CSV.
Upvotes: 2