Reputation: 549
I'm trying to find a way to parse chat text, and I'm having some issues. My purpose is to split the text into the fields (Date, Time, Name, Text) and get a statistic by Name, Date, and get number of words/letter in total for each person.
A sample would be this :
10/06/2019, 23:17 - Nasu Alex Taranu Gilmeanu: De iesit iesim cu siguranta. Dar tre sa cadem de acord la o varianta
10/06/2019, 23:17 - Dura Stefanel: Serios acum
10/06/2019, 23:18 - Dura Stefanel: E cea mai frumoasa cazare de pe site: din câte am văzut pana acum
11/06/2019, 00:04 - Nasu Alex Taranu Gilmeanu: http://www.booking.com/Share-CJY
11/06/2019, 18:31 - Danutz: Sa îl mănânci - cu Botu :)
The code I'm using is the one below, but I can't figure what the regex should be in order for it to:
I load the stringData variable form a text file using Ajax, I just added str as a small sample of the data:
var stringData = $.ajax({
url: "http://localhost/_FunStuff/_ChatCounter/2021.01.04_textFile.txt",
async: false
}).responseText;
const str = `10/06/2019, 23:17 - Nasu Alex Taranu Gilmeanu: De iesit iesim cu siguranta. Dar tre sa cadem de acord la o varianta
10/06/2019, 23:17 - Dura Stefanel: Serios acum
10/06/2019, 23:18 - Dura Stefanel: E cea mai frumoasa cazare de pe site: din câte am văzut pana acum
11/06/2019, 00:04 - Nasu Alex Taranu Gilmeanu: http://www.booking.com/Share-CJY
11/06/2019, 18:31 - Danutz: Sa îl mănânci - cu Botu :)`;
function splitImportedData(stringData) {
$arrRows = stringData.split("\n");
for (rowi = 0; rowi < $arrRows.length; rowi++) {
var $strRow = $arrRows[rowi]
var $arrRow = $strRow.split(/[,:\s+\-]/);
if (rowi == 1) {
console.log($arrRow);
//alert($arrRow);
}
}
}
splitImportedData(str);
Upvotes: 2
Views: 203
Reputation: 4592
you need not have to do a split on the row.
const str = `- 10/06/2019, 23:17 - Nasu Alex Taranu Gilmeanu: De iesit iesim cu siguranta. Dar tre sa cadem de acord la o varianta
- 10/06/2019, 23:17 - Dura Stefanel: Serios acum
- 10/06/2019, 23:18 - Dura Stefanel: E cea mai frumoasa cazare de pe site: din câte am văzut pana acum
- 11/06/2019, 00:04 - Nasu Alex Taranu Gilmeanu: http://www.booking.com/Share-CJY
- 11/06/2019, 18:31 - Danutz: Sa îl mănânci - cu Botu :)`;
const splitImportedData = (stringData) => {
return stringData.split("\n").map(row => {
const m = row.match(/^\s*?- (.+?), (.+?) - (.+?): (.+)/);
return {
date: m[1],
time: m[2],
name: m[3],
text: m[4],
}
});
}
console.log(splitImportedData(str));
Upvotes: 4
Reputation: 14175
Might I encourage you to use the .match()
method instead splitting and more parsing. As you'll see, you get the direct result you are seeking:
const str = `- 10/06/2019, 23:17 - Nasu Alex Taranu Gilmeanu: De iesit iesim cu siguranta. Dar tre sa cadem de acord la o varianta\n
- 10/06/2019, 23:17 - Dura Stefanel: Serios acum\n
- 10/06/2019, 23:18 - Dura Stefanel: E cea mai frumoasa cazare de pe site: din câte am văzut pana acum\n
- 11/06/2019, 00:04 - Nasu Alex Taranu Gilmeanu: http://www.booking.com/Share-CJY\n
- 11/06/2019, 18:31 - Danutz: Sa îl mănânci - cu Botu :)\n`;
function splitImportedData(stringData) {
const result = [];
const regex = /(?<date>\d{2}\/\d{2}\/\d{4}), (?<time>\d{2}:\d{2}) - (?<author>.*(?=:)):\s(?<comment>.*)/;
stringData.split("\n").filter(s=>s!='').forEach(r => {
result.push(r.match(regex).groups);
});
return result;
}
let res = splitImportedData(str);
console.log(res);
Upvotes: 2
Reputation: 891
You can use non-capturing groups in your split regex eg: /(?:, )|(?:- )|(?:: )/
obviously you can make something smarter but this could help as a basic example.
You can use as ref: Regular_Expressions/Groups_and_Ranges
Upvotes: 1