Reputation: 404
I have a dataframe with the column name message. I would like to get just the request part of a row and create a new text file. One request per line.
There are multiple line of these.
time stamp message
0 May 17, 2021 @ 03:29:58.585 2021-05-17 12:29:57,725 JST [INFO] (api.py:335) ref: IVUUPK, endpoint: RSL, request data: {"items":[{"depth":26.5,"height":10.0,"width":20.0,"item_cd":"4562166805883","item_name":"やわらかエコ湯たんぽプレミアム CY-N10S PINK R4562166805883","qty":2}],"order_cd":"5034868310","seller_cd":"31056"}
1 May 17, 2021 @ 03:29:58.384 2021-05-17 12:29:57,568 JST [INFO] (api.py:335) ref: B30FP7, endpoint: RSL, request data: {"items":[{"depth":26.5,"height":10.0,"width":20.0,"item_cd":"4562166805883","item_name":"やわらかエコ湯たんぽプレミアム CY-N10S PINK R4562166805883","qty":1},{"depth":26.5,"height":10.0,"width":20.0,"item_cd":"4562166805890","item_name":"やわらかエコ湯たんぽプレミアム CY-N10S BLUE R4562166805890","qty":1}],"order_cd":"5034868317","seller_cd":"31056"}
2 May 17, 2021 @ 03:29:58.304 2021-05-17 12:29:57,407 JST [INFO] (api.py:335) ref: B2QR1V, endpoint: RSL, request data: {"items":[{"depth":27.4,"height":3.1,"width":21.5,"item_cd":"2400000071440","item_name":"子ども用 カットソー 【OMNES】キッズ ジャガードストレッチ長袖Tシャツ トップス カジュアル こども用 80cm 90cm 100cm 110cm 120cm 130cm 140cm HAPTIC ハプティック 母の日 1519-5047-K110-061-MGR","qty":1},{"depth":34.0,"height":2.0,"width":13.0,"item_cd":"2400000070948","item_name":"子ども用 パンツ 【OMNES】キッズ ジャガードストレッチパンツ ボトムス カジュアル こども用 80cm 90cm 100cm 110cm 120cm 130cm 140cm HAPTIC ハプティック 母の日 1519-2012-K120-061-MGR","qty":1},{"depth":26.0,"height":3.5,"width":21.3,"item_cd":"2400000071563","item_name":"子ども用 カットソー 【OMNES】キッズ ジャガードストレッチ長袖Tシャツ トップス カジュアル こども用 80cm 90cm 100cm 110cm 120cm 130cm 140cm HAPTIC ハプティック 母の日 1519-5047-K130-001-WT","qty":1},{"depth":33.1,"height":3.5,"width":28.5,"item_cd":"2400000073468","item_name":"【OMNES Another Edition】ベアワッフルヘンリーネックトップス レディース カットソー フリーサイズ 半袖 カジュアル トップス ボタン HAPTIC ハプティック 母の日 7120-5055-F-031-GBG","qty":1}],"order_cd":"5034867576","seller_cd":"32424"}
3 May 17, 2021 @ 03:29:58.295 2021-05-17 12:29:57,417 JST [INFO] (api.py:335) ref: B2FYHU, endpoint: RSL, request data: {"items":[{"depth":12.4,"height":9.5,"width":23.8,"item_cd":"4580294612012","item_name":"競技用けん玉「大空」単色:青 R4580294612012","qty":20}],"order_cd":"5034868288","seller_cd":"31056"}
Expected result:
{"items":[{"depth":26.5,"height":10.0,"width":20.0,"item_cd":"4562166805883","item_name":"やわらかエコ湯たんぽプレミアム CY-N10S PINK R4562166805883","qty":2}],"order_cd":"5034868310","seller_cd":"31056"}
{"items":[{"depth":26.5,"height":10.0,"width":20.0,"item_cd":"4562166805883","item_name":"やわらかエコ湯たんぽプレミアム CY-N10S PINK R4562166805883","qty":1},{"depth":26.5,"height":10.0,"width":20.0,"item_cd":"4562166805890","item_name":"やわらかエコ湯たんぽプレミアム CY-N10S BLUE R4562166805890","qty":1}],"order_cd":"5034868317","seller_cd":"31056"}
{"items":[{"depth":12.4,"height":9.5,"width":23.8,"item_cd":"4580294612012","item_name":"競技用けん玉「大空」単色:青 R4580294612012","qty":20}],"order_cd":"5034868288","seller_cd":"31056"}
Is there an efficient way of using regex
or some other way of getting the expected results. I would appreciate your advice. Thanks.
Upvotes: 0
Views: 46
Reputation: 404
Other way to do it is in using the shell script
cat *csv | perl -pe 's/.*request data: ({.+}).*/$1/g'
Upvotes: 0
Reputation: 106
import json
import re
string = "<your input string>"
groups = re.findall(r"{.*}", string)
output = []
for group in groups:
output.append(json.loads(group))
Output:
{'items': [{'depth': 26.5, 'height': 10.0, 'width': 20.0, 'item_cd': '4562166805883', 'item_name': 'やわらかエコ湯たんぽプレミアムCY-N10S PINK R4562166805883', 'qty': 2}], 'order_cd': '5034868310', 'seller_cd': '31056'}
.
.
.
{'items': [{'depth': 12.4, 'height': 9.5, 'width': 23.8, 'item_cd': '4580294612012', 'item_name': '競技用けん玉大空単色青 R4580294612012', 'qty': 20}], 'order_cd': '5034868288', 'seller_cd': '31056'}
Upvotes: 1
Reputation: 56
if the message is a string, you could use the find method to cut everything else and create what you want like this:
string = '2021-05-17 12:29:57,725 JST [INFO] (api.py:335) ref: IVUUPK, endpoint: RSL, request data: {"items":[{"depth":26.5,"height":10.0,"width":20.0,"item_cd":"4562166805883","item_name":"やわらかエコ湯たんぽプレミアム CY-N10S PINK R4562166805883","qty":2}],"order_cd":"5034868310","seller_cd":"31056"}'
print(string[string.find("data:")+6:])
out:
{"items":[{"depth":26.5,"height":10.0,"width":20.0,"item_cd":"4562166805883","item_name":"やわらかエコ湯たんぽプレミアム CY-N10S PINK R4562166805883","qty":2}],"order_cd":"5034868310","seller_cd":"31056"}
you can apply that to every row and then create the txt file you want
Upvotes: 0