AGamePlayer
AGamePlayer

Reputation: 7736

Any idea on parsing this string with JavaScript?

update:

Thanks for helping out. Actually I used the CSV parser to get what I want but I ask just because I want to know how the inner part of CSV parser works.


It's a part from Google Analytics CSV report. Actually I have found many other libs to retrieve what I want but I just really want to know what is the best way to get the data I want from this particular case. Though at first it looks not that hard, it's getting my crazy...

The data looks like this as a string:

/page1/index.php,"795,852","620,499",00:03:25,"33,416",10.82%,66.43%,$0.00

The string /page1/index.php is a page's name. The first number "795,852" is the page view The second number "620,499" is the unique page view then with the avg. duration time.

Then I want to parse it to an object as:

{
  page: "/page1/index.php"
  pv: 795852
  uv: 620499
  avg_time:"00:03:25"
}

For some reasons, I only need to keep the first four data from this string. When I try to use a simple JavaScript code to parse, everything works fine until I found something different when the "pageviews" data are small.

For instance, sometimes it looks like:

/page2/index.php,"795,852",620,00:03:25,"33,416",10.82%,66.43%,$0.00

Or:

/page3/index.php,852,"620,499",00:03:25,"33,416",10.82%,66.43%,$0.00

Or:

/page4/index.php,852,620,00:03:25,"33,416",10.82%,66.43%,$0.00

The rule is: when the number is bigger than a thousand, it is written as

"795,852"

But when the number is smaller, it's just

852

There is no "" with it and of course, no , as the splitter. This makes it very hard to use just Regular Expression to get the data.

This makes it very difficult to parse the string into a wanted object, something like:

{
  page: "/page1/index.php"
  pv: 795852
  uv: 620499
  avg_time:"00:03:25"
}

any good ideas on parsing this with JavaScript?

Upvotes: 2

Views: 87

Answers (4)

Adrian Lynch
Adrian Lynch

Reputation: 8494

How about:

var data = [
  '/page1/index.php,"795,852","620,499",00:03:25,"33,416",10.82%,66.43%,$0.00',
  '/page2/index.php,"795,852",620,00:03:25,"33,416",10.82%,66.43%,$0.00',
  '/page3/index.php,852,"620,499",00:03:25,"33,416",10.82%,66.43%,$0.00',
  '/page4/index.php,852,620,00:03:25,"33,416",10.82%,66.43%,$0.00'
];

data.map(function (item) {
  return item.replace(/"(\d+),(\d+)"/g, '$1$2');
}).map(function (item) {
  var a = item.split(',');
  return {
    page: a[0],
    pv: parseInt(a[1]),
    uv: parseInt(a[2]),
    avg_time: a[3]
  };
});

Which results in:

[
  {
    "page": "/page1/index.php",
    "pv": 795852,
    "uv": 620499,
    "avg_time": "00:03:25"
  },
  {
    "page": "/page2/index.php",
    "pv": 795852,
    "uv": 620,
    "avg_time": "00:03:25"
  },
  {
    "page": "/page3/index.php",
    "pv": 852,
    "uv": 620499,
    "avg_time": "00:03:25"
  },
  {
    "page": "/page4/index.php",
    "pv": 852,
    "uv": 620,
    "avg_time": "00:03:25"
  }
]

What's wrong with this?

  • It's fragile
  • The RegEx to replace the , in the numbers is weak

But...

  • It seems to work!

Upvotes: 0

Scott Hunter
Scott Hunter

Reputation: 49803

I agree with the arguments against using regex for such problems, in general, and it would probably be easier to use a proper parser; however, in this case, I think a regex will work:

^([^,]+),(("[^"]+")|([^,]+)),(("[^"]+")|([^,]+)),([^,]+),

That is:

  • the first field is everything up to the first comma
  • if the next field starts with a ", get everything up to the next "; otherwise, get everything up to the next comma
  • Ditto for next field
  • Last field is everything up to the next comma

Upvotes: 1

Aleksey Shein
Aleksey Shein

Reputation: 7482

Try some CSV parser, like Papa parse.

Upvotes: 0

Darth Egregious
Darth Egregious

Reputation: 20096

Use a csv parser, not Regex. Try something like this: https://www.npmjs.com/package/csv

Regex is not a suitable tool for parsing CSV.

Upvotes: 2

Related Questions