Reputation: 4575
I have a pandas dataframe "df". the dataframe has a field in it "json_field" that contains json data. I think it's actually yaml format. I'm trying to parse the json values in to their own columns. I have example data below. when I run the code below I'm getting an error when it hits the 'name' field. the code and the error are below. it seems like maybe it's having trouble because the value associated with the name field isn't quoted, or has spaces. does anyone see what the issue might be or suggest how to fix?
example:
print(df['json_field'][0])
output:
- start:"test"
name: Price Unit
field: priceunit
type: number
operator: between
value:
- '40'
- '60'
code:
import yaml
pd.io.json.json_normalize(yaml.load(df['json_field'][0]), 'name','field','type','operator','value').head()
error:
An error was encountered:
mapping values are not allowed here
in "<unicode string>", line 3, column 7:
name: Price Unit
^
Traceback (most recent call last):
File "/usr/local/lib64/python3.6/site-packages/yaml/__init__.py", line 94, in safe_load
return load(stream, SafeLoader)
File "/usr/local/lib64/python3.6/site-packages/yaml/__init__.py", line 72, in load
return loader.get_single_data()
File "/usr/local/lib64/python3.6/site-packages/yaml/constructor.py", line 35, in get_single_data
node = self.get_single_node()
File "/usr/local/lib64/python3.6/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/usr/local/lib64/python3.6/site-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/usr/local/lib64/python3.6/site-packages/yaml/composer.py", line 82, in compose_node
node = self.compose_sequence_node(anchor)
File "/usr/local/lib64/python3.6/site-packages/yaml/composer.py", line 110, in compose_sequence_node
while not self.check_event(SequenceEndEvent):
File "/usr/local/lib64/python3.6/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/usr/local/lib64/python3.6/site-packages/yaml/parser.py", line 382, in parse_block_sequence_entry
if self.check_token(BlockEntryToken):
File "/usr/local/lib64/python3.6/site-packages/yaml/scanner.py", line 116, in check_token
self.fetch_more_tokens()
File "/usr/local/lib64/python3.6/site-packages/yaml/scanner.py", line 220, in fetch_more_tokens
return self.fetch_value()
File "/usr/local/lib64/python3.6/site-packages/yaml/scanner.py", line 580, in fetch_value
self.get_mark())
yaml.scanner.ScannerError: mapping values are not allowed here
in "<unicode string>", line 3, column 7:
name: Price Unit
desired output:
name field type operator value
Price Unit priceunit number between - '40' - '60'
Update:
I tried the suggestion below
import yaml
df['json_field']=df['json_field'].str.replace('"test"', "")
pd.io.json.json_normalize(yaml.safe_load(df['json_field'][0].lstrip()), 'name','field','type','operator','value').head()
and got the output below:
operator0 typefield
0 P priceunit
1 r priceunit
2 i priceunit
3 c priceunit
4 e priceunit
Upvotes: 0
Views: 268
Reputation: 13407
Seems to me that it's having an issue because your yaml is improperly formatted. That "test" should not be there at all if you want a dictionary named "start" with 5 mappings inside of it.
import yaml
a = """
- start:
name: Price Unit
field: priceunit
type: number
operator: between
value:
- '40'
- '60'
""".lstrip()
# Single dictionary named start, with 5 entries
yaml.safe_load(a)[0]
{'start':
{'name': 'Price Unit',
'field': 'priceunit',
'type': 'number',
'operator': 'between',
'value': ['40', '60']}
}
To do this with your data, try:
data = yaml.load(df['json_field'][0].replace('"test"', ""))
df = pd.json_normalize(yaml.safe_load(a)[0]["start"])
print(df)
name field type operator value
0 Price Unit priceunit number between [40, 60]
Upvotes: 1
Reputation: 30971
It seems that your real problem in one line earlier.
Note that your YAML fragment starts with start:"test"
.
One of key principles of YAML is that a key is separated from its value by a colon and a space, whereas your sample does not contain this (mandatory) space.
Maybe you should manually edit your input, i.e. add this missing space.
Other solution is to write a "specialized pre-parser", which adds such missing spaces and its result is fed into the YAML parser. This can be an option if the number of such cases is big.
Upvotes: 0