Reputation: 341
I have the following problem, I'm trying to convert the strings in a pandas df into lists so that I can later structure them as a dict. Please see below:
My data is coming in as a string in a pandas df, i.e.
df['users'].iloc[0] = "str1|str2, str3|str4"
..... And so on for the series.
From here I split the strings as such:
df['users'] = df['users'].map(lambda x: re.split("[',|']",x))
, which returns a list [str1, str2, str3, str4]
.
So far so good. The challenge I haven't been able to resolve is taking said list and structure it as a dictionary such that I would produce the following:
[{
field1: str1
field2: str2
field3:
field4:
},{
field1: str3
field2: str4
field3:
field4:
}]
where the empty fields could be filled out later (optional).
Is there a better way to structure the data in order to make this goal easier? i.e. have the list as [[str1, str2],[str3, str4]]
How would I go about 'zipping' these values from this lists with the name name of the fields (field1, filed2, ...)?
In essence, the final output should contain the dictionary above in each cell of the df where the original string used to reside.
Can anyone offer insights? Thanks.
Upvotes: 1
Views: 211
Reputation: 294488
df.users.map(
lambda s: [x.split('|') for x in s.split(', ')]
)
0 [[str1, str2], [str3, str4]]
Name: users, dtype: object
df.users.map(
lambda s: [
{f'field{i}': v for i, v in enumerate(x.split('|'), 1)}
for x in s.split(', ')
]
)
0 [{'field1': 'str1', 'field2': 'str2'}, {'field...
Name: users, dtype: object
fields = 'field1 field2 field3 field4'.split()
df.users.map(
lambda s: [dict(zip(fields, x.split('|'))) for x in s.split(', ')]
)
0 [{'field1': 'str1', 'field2': 'str2'}, {'field...
Name: users, dtype: object
from itertools import zip_longest
fields = 'field1 field2 field3 field4'.split()
df.users.map(
lambda s: [dict(zip_longest(fields, x.split('|'))) for x in s.split(', ')]
)
Upvotes: 2
Reputation:
Something like this might help (assuming that there are always exactly four fields):
import itertools
import pprint
FIELDS = [
'field1',
'field2',
'field3',
'field4',
]
test_str = "str1|str2, str3|str4"
items = test_str.split(',')
results = [
# Pads non-existent fields with `None`.
dict(itertools.zip_longest(FIELDS, item.split('|')))
for item in items
]
pprint.pprint(results)
# Returns:
# [{'field1': 'str1', 'field2': 'str2', 'field3': None, 'field4': None},
# {'field1': ' str3', 'field2': 'str4', 'field3': None, 'field4': None}]
Upvotes: 2