Reputation: 2953
0 {'not_needed': 'not_needed', 'needed': ['', 'PPP', 8.414448]}
1 {'not_needed': 'not_needed', 'needed': ['', 'FFF', 7.414448]}
Just learning with pandas and I somehow parsed a complex data like this. But how can we create a new pandas dataframe from the array values of the key needed
by ignoring the first empty string value and using just the other 2 values in 2 new pandas column named name
& value
?
ExpectedOutput(Two columns with numbered index)
0 {'name': 'PPP', 'value': 8.414448}
1 {'name': 'FFF', 'value': 7.414448}
Upvotes: 0
Views: 384
Reputation: 13387
Assuming your Series
has regular schema i.e. all rows have same dict keys, and level of nesting you're touching:
ds1 = ds.str["needed"].str[1:]
ds2 = pd.DataFrame(ds1.to_list(), columns = ["name", "value"])
ds3 = pd.Series(ds2.to_dict("record"))
For the input in pd.Series
format:
import pandas as pd
ds = pd.Series([{'not_needed': 'not_needed', 'needed': ['', 'PPP', 8.414448]},
{'not_needed': 'not_needed', 'needed': ['', 'FFF', 7.414448]}])
Now to explain steps:
ds1
- the way to interact with list
or dict
in pandas
row is by invoking .str[key]
where key
can be either dict
key or list
reference.
ds2
- is the way to break ds1
into columns, with predefined names.
ds3
- to_dict("record")
will convert your data frame into list, where each row is represented by single entry of the format {column1_name: column1_value_rowN, column2_name: column2_value_rowN, ...}
Upvotes: 2