Reputation: 2424
I want to find all occurrences of freeWifi = "Y" and state = "NY" in the a json string but if there 2 consecutive occurences it consider them as one match instead of 2:
The pattern I used is '"freeWifi": "Y",(\s+\S+)+state": "NY"'.
When I use '"freeWifi": "Y",(\s+\S+){5}state": "NY"' it gives me the desired solution but it is not general enough in case the new lines are added to the json file.
Part of the data:
"freeWifi": "Y",
"storeNumber": "14372",
"phone": "(305)672-7055",
"state": "NY",
"storeUrl": "http://www.mcflorida.com/14372",
"playplace": "N",
"address": "1601 ALTON RD",
"storeType": "FREESTANDING",
"archCard": "Y",
"driveThru": "Y"
}
"type": "Feature",
"properties": {
"city": "MIAMI",
"zip": "33135",
"freeWifi": "Y",
"storeNumber": "7408",
"phone": "(305)285-0974",
"state": "NY",
"storeUrl": "http://www.mcflorida.com/7408",
"playplace": "Y",
"address": "1400 SW 8TH ST",
"storeType": "FREESTANDING",
"archCard": "Y",
"driveThru": "Y"
}
},
{
Part II
After implementing Steven solution, when I tried it on the data file with many entries, the program ran forever and did not give an answer.
The new regex is: '"freeWifi": "Y",(\s+?\S+?)+?state": "NY"'.
To see why the system hangs I checked the program against part of the data, increasing the size by 100,000 bytes each time. The results shows significant slowdown as the size increases' showing possibly problem of the regex, as explained in Program run forever when matching regex.
Sorry for the lousy display of the table, but I could not make it nicer (I removed tabs and padded with spaces but it ignores them)
Time_Passed.....Size_Checked File_Size Matches
7.3e-05 ...........100000 8345167 30
0.008906 200000 8345167 30
0.466485 300000 8345167 31
0.500054 400000 8345167 75
0.523969 500000 8345167 142
0.553361 600000 8345167 201
0.586032 700000 8345167 201
1.072181 800000 8345167 338
1.114541 900000 8345167 482
1.157304 1000000 8345167 630
1.203889 1100000 8345167 630
1.625656 1200000 8345167 630
3.126974 1300000 8345167 630
6.501044 1400000 8345167 630
12.476704 1500000 8345167 630
Upvotes: 0
Views: 231
Reputation: 1376
The lazy operator is ?
. Your expression with the lazy operator would be "freeWifi": "Y",(\s+?\S+?)+state": "NY"
See example in regexr.
As @anubhava has pointed out, this is not going to work on generic input. For example I imagine that you don't want this match:
"type": "Feature",
"properties": {
"freeWifi": "Y",
"storeNumber": "9876",
"state": "PA"
}
},
"type": "Feature",
"properties": {
"freeWifi": "N",
"storeNumber": "1234",
"state": "NY",
}
},
Upvotes: 1