Reputation: 1161
I need to remove all words before the dash at the beginning of each sentence. Some sentences do not have words before dashes and dashes within the long sentence need to stay. Here is an example:
How do I change these strings:
PARIS — President Nicolas Sarkozy, running from behind for reelection...
GAZA CITY —Cross-border fighting between Gaza and Israel...
CARURU, Colombia — Quite suddenly, the endless green of Amazonian forest...
A year after an earthquake and tsunami devastated Japan's northeastern coast...
Into these strings:
President Nicolas Sarkozy, running from behind for reelection...
Cross-border fighting between Gaza and Israel...
Quite suddenly, the endless green of Amazonian forest...
A year after an earthquake and tsunami devastated Japan's northeastern coast...
How can I accomplish this with javascript (or php if javascript doesn't allow it)?
Upvotes: 0
Views: 291
Reputation: 55678
This is a pretty straightforward regex problem, but geez, it's not as straightforward as all the other answers assume. A few points:
Regex is the right choice - the split
and substr
answers won't deal with the leading space, and can't distinguish between a dateline with a dash at the beginning of a sentence, and a dash in the middle of your text content. Any option you use ought to be able to deal with content like: "President Nicolas Sarkozy — running from behind for reelection — came to Paris today..."
as well as the options you suggest.
It's tricky to automatically recognize that my test sentence above doesn't have a dateline. Almost all the answers so far use the single description: any number of arbitrary characters, followed by a dash
. That's insufficient for a test sentence like the one above.
You'll get better results by adding a few more rules, like fewer than X characters, located at the beginning of the string, followed by a dash, optionally followed by an arbitrary number of spaces, followed by a capital letter
. Even this won't work correctly with "President Sarkozy — Carla Bruni's husband..."
, but you're going to have to assume that this edge case is sufficiently rare to ignore.
All of which gives you a function like this:
function removeDateline(str) {
return str.replace(/^[^—]{3,75}—\s*(?=[A-Z])/, "");
}
Breaking it down:
^
- must occur at the beginning of the string.[^—]{3,75}
- between 3 and 75 characters other than a dash\s*
- optional spacesUsage:
var s = "PARIS — President Nicolas Sarkozy, running from behind for reelection...";
removeDateline(s); // "President Nicolas Sarkozy — running from behind for reelection..."
s = "PARIS — President Nicolas Sarkozy — running from behind for reelection...";
removeDateline(s); // "President Nicolas Sarkozy — running from behind for reelection..."
s = "CARURU, Colombia — Quite suddenly, the endless green of Amazonian forest...";
removeDateline(s); // "Quite suddenly, the endless green of Amazonian forest..."
Upvotes: 5
Reputation: 4077
If each sentence can be separated from the others you can use a regexp. Like this example:
var s = "PARIS — President Nicolas Sarkozy, running from behind for reelection..."
function removeWord(str)
{
return str.replace(/^[^—]+—[\s]*/, "");
}
alert(removeWord(s));
Upvotes: 0
Reputation: 207901
In the most basic example:
var str = "PARIS - President Nicolas Sarkozy, running from behind for reelection.";
alert(str.split('-')[1]); // outputs: President Nicolas Sarkozy, running from behind for reelection.
Based on your actual document structure there could be ways to loop through the content to speed this type of operation up.
Upvotes: 0
Reputation: 350
PHP
$x = "PARIS — President Nicolas Sarkozy, running from behind for reelection...";
$var = substr($x, strpos($x, "—"));
Upvotes: 0