Walter U.
Walter U.

Reputation: 341

pandas - string to list to dictionary

I have the following problem, I'm trying to convert the strings in a pandas df into lists so that I can later structure them as a dict. Please see below:

My data is coming in as a string in a pandas df, i.e.

df['users'].iloc[0] = "str1|str2, str3|str4"

..... And so on for the series.

From here I split the strings as such:

df['users'] = df['users'].map(lambda x: re.split("[',|']",x)), which returns a list [str1, str2, str3, str4].

So far so good. The challenge I haven't been able to resolve is taking said list and structure it as a dictionary such that I would produce the following:

[{ field1: str1 field2: str2 field3: field4: },{ field1: str3 field2: str4 field3: field4: }]

where the empty fields could be filled out later (optional).

Is there a better way to structure the data in order to make this goal easier? i.e. have the list as [[str1, str2],[str3, str4]]

How would I go about 'zipping' these values from this lists with the name name of the fields (field1, filed2, ...)?

In essence, the final output should contain the dictionary above in each cell of the df where the original string used to reside.

Can anyone offer insights? Thanks.

Upvotes: 1

Views: 211

Answers (2)

piRSquared
piRSquared

Reputation: 294488

list of lists

df.users.map(
    lambda s: [x.split('|') for x in s.split(', ')]
)

0    [[str1, str2], [str3, str4]]
Name: users, dtype: object

Dictionaries using f-strings

df.users.map(
    lambda s: [
        {f'field{i}': v for i, v in enumerate(x.split('|'), 1)}
        for x in s.split(', ')
    ]
)

0    [{'field1': 'str1', 'field2': 'str2'}, {'field...
Name: users, dtype: object

With pre determined fields

fields = 'field1 field2 field3 field4'.split()

df.users.map(
    lambda s: [dict(zip(fields, x.split('|'))) for x in s.split(', ')]
)

0    [{'field1': 'str1', 'field2': 'str2'}, {'field...
Name: users, dtype: object

If you want all fields

from itertools import zip_longest

fields = 'field1 field2 field3 field4'.split()

df.users.map(
    lambda s: [dict(zip_longest(fields, x.split('|'))) for x in s.split(', ')]
)

Upvotes: 2

user9645477
user9645477

Reputation:

Something like this might help (assuming that there are always exactly four fields):

import itertools
import pprint

FIELDS = [
    'field1',
    'field2',
    'field3',
    'field4',
]

test_str = "str1|str2, str3|str4"
items = test_str.split(',')
results = [
    # Pads non-existent fields with `None`.
    dict(itertools.zip_longest(FIELDS, item.split('|')))
    for item in items
]

pprint.pprint(results)
# Returns:
# [{'field1': 'str1', 'field2': 'str2', 'field3': None, 'field4': None},
#  {'field1': ' str3', 'field2': 'str4', 'field3': None, 'field4': None}]

Upvotes: 2

Related Questions