creating a new dataframe from array of values

Question

0    {'not_needed': 'not_needed', 'needed': ['', 'PPP', 8.414448]}
1    {'not_needed': 'not_needed', 'needed': ['', 'FFF', 7.414448]}

Just learning with pandas and I somehow parsed a complex data like this. But how can we create a new pandas dataframe from the array values of the key needed by ignoring the first empty string value and using just the other 2 values in 2 new pandas column named name & value?

ExpectedOutput(Two columns with numbered index)

0    {'name': 'PPP', 'value': 8.414448}
1    {'name': 'FFF', 'value': 7.414448}

Georgina Skibinski · Accepted Answer

Assuming your Series has regular schema i.e. all rows have same dict keys, and level of nesting you're touching:

ds1 = ds.str["needed"].str[1:]
ds2 = pd.DataFrame(ds1.to_list(), columns = ["name", "value"])
ds3 = pd.Series(ds2.to_dict("record"))

For the input in pd.Series format:

import pandas as pd

ds = pd.Series([{'not_needed': 'not_needed', 'needed': ['', 'PPP', 8.414448]},
{'not_needed': 'not_needed', 'needed': ['', 'FFF', 7.414448]}])

Now to explain steps:

ds1 - the way to interact with list or dict in pandas row is by invoking .str[key] where key can be either dict key or list reference.

ds2 - is the way to break ds1 into columns, with predefined names.

ds3 - to_dict("record") will convert your data frame into list, where each row is represented by single entry of the format {column1_name: column1_value_rowN, column2_name: column2_value_rowN, ...}

creating a new dataframe from array of values

Answers (1)

Related Questions