lummers
lummers

Reputation: 779

Dataframe: Cell Level: Convert Comma Separated String to List

I have a CSV file that contains information about journeys taken by car.

enter image description here

I want to wrangle this data so that I have a single list for each journey (every row). That list should contain the journey_code as the first item in the list and then all the subsequent MGRS units as separate items. Lastly, I would want to have all those journey lists grouped into a parent list.

If I was doing this manually, it would look like this:

journeyCodeA = ['journeyCodeA', 'mgrs1', 'mgrs2', 'mgrs3']
journeyCodeB = ['journeyCodeB', 'mgrs2', 'mgrs4', 'mgrs7']
combinedList = [journeyCodeA, journeyCodeB]

Here is what I have so far to create a list per row and to combine the desired columns.

comparison_journey_mgrs = pd.read_csv(r"journey-mgrs.csv", delimiter = ',')
comparison_journey_mgrs['mgrs_grids'] = comparison_journey_mgrs['mgrs_grids'].str.replace(" ","")
comparison_journey_list = []

for index, rows in comparison_route_mgrs.iterrows():
        holding_list = [rows.journey_code, rows.mgrs_grids]
        comparison_journey_list.append(holding_list)

The problem with this is that it is treating the mgrs_grids column as a single string.

My list looks like this:

[['7211863-140','18TWL927129,18TWL888113,18TWL888113,...,18TWL903128']]

But I want it to looks like this:

[['7211863-140','18TWL927129', '18TWL888113', '18TWL888113',..., '18TWL903128']]

I am struggling to find a way to iterate over each row of the dataframe, reference the mgrs_grids column and then convert the comma-separated string to a list in-place.

Thanks for any help!


{'driver_code': {0: 7211863, 1: 7211863, 2: 7211863, 3: 7211863},
 'journey_code': {0: '7211863-140',
  1: '7211863-105',
  2: '7211863-50',
  3: '7211863-109'},
 'mgrs_grids': {0: '18TWL927129,18TWL888113,18TWL888113,18TWL887113,18TWL888113,18TWL887113,18TWL887113,18TWL887113,18TWL903128',
  1: '18TWL927129,18TWL939112,18TWL939112,18TWL939113,18TWL939113,18TWL939113,18TWL939113,18TWL939113,18TWL939113,18TWL960111,18TWL960112',
  2: '18TWL927129,18TWL889085,18TWL889085,18TWL888085,18TWL888085,18TWL888085,18TWL888085,18TWL888085,18TWL890085',
  3: '18TWL927129,18TWL952106,18TWL952106,18TWL952106,18TWL952106,18TWL952106,18TWL952106,18TWL952106,18TWL952105,18TWL951103'}}

Upvotes: 2

Views: 14167

Answers (2)

Trenton McKinney
Trenton McKinney

Reputation: 62393

# use str split on the column
df.mgrs_grids = df.mgrs_grids.str.split(',')

# display(df)
   driver_code journey_code                                                                                                                                       mgrs_grids
0      7211863  7211863-140                            [18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1      7211863  7211863-105  [18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2      7211863   7211863-50                            [18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3      7211863  7211863-109               [18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]

print(type(df.loc[0, 'mgrs_grids']))
[out]:
list

separate row per value

# get a separate row for each value
df = df.explode('mgrs_grids').reset_index(drop=True)

# display(df.hea())
   driver_code journey_code   mgrs_grids
0      7211863  7211863-140  18TWL927129
1      7211863  7211863-140  18TWL888113
2      7211863  7211863-140  18TWL888113
3      7211863  7211863-140  18TWL887113
4      7211863  7211863-140  18TWL888113

Update

  • Here is another option, which combines the 'journey_code' to the front of 'mgrs_grids', and then splits the string into a list.
    • This list is assigned back to 'mgrs_grids', but can also be assigned to a new column.
# add the journey code to mgrs_grids and then split
df.mgrs_grids = (df.journey_code + ',' + df.mgrs_grids).str.split(',')

# display(df.head())
   driver_code journey_code                                                                                                                                                    mgrs_grids
0      7211863  7211863-140                            [7211863-140, 18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1      7211863  7211863-105  [7211863-105, 18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2      7211863   7211863-50                             [7211863-50, 18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3      7211863  7211863-109               [7211863-109, 18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]

# output to nested list
df.mgrs_grids.tolist()

[out]:
[['7211863-140', '18TWL927129', '18TWL888113', '18TWL888113', '18TWL887113', '18TWL888113', '18TWL887113', '18TWL887113', '18TWL887113', '18TWL903128'],
 ['7211863-105', '18TWL927129', '18TWL939112', '18TWL939112', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL960111', '18TWL960112'],
 ['7211863-50', '18TWL927129', '18TWL889085', '18TWL889085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL890085'],
 ['7211863-109', '18TWL927129', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952105', '18TWL951103']]

Upvotes: 15

Umar.H
Umar.H

Reputation: 23099

You could also split and explode your dataframe into a tabular format.

df1 = df.join(df['mgrs_grids'].str.split(',',expand=True).stack().reset_index(1),how='outer')\
        .drop(['level_1','mgrs_grids'],1).rename(columns={0 : 'mgrs_grids'})


print(df1)

   driver_code journey_code   mgrs_grids
0      7211863  7211863-140  18TWL927129
0      7211863  7211863-140  18TWL888113
0      7211863  7211863-140  18TWL888113
0      7211863  7211863-140  18TWL887113
0      7211863  7211863-140  18TWL888113
0      7211863  7211863-140  18TWL887113
0      7211863  7211863-140  18TWL887113
0      7211863  7211863-140  18TWL887113
0      7211863  7211863-140  18TWL903128
1      7211863  7211863-105  18TWL927129
1      7211863  7211863-105  18TWL939112
1      7211863  7211863-105  18TWL939112
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL939113
1      7211863  7211863-105  18TWL960111
1      7211863  7211863-105  18TWL960112
2      7211863   7211863-50  18TWL927129
2      7211863   7211863-50  18TWL889085
2      7211863   7211863-50  18TWL889085
2      7211863   7211863-50  18TWL888085
2      7211863   7211863-50  18TWL888085
2      7211863   7211863-50  18TWL888085
2      7211863   7211863-50  18TWL888085
2      7211863   7211863-50  18TWL888085
2      7211863   7211863-50  18TWL890085
3      7211863  7211863-109  18TWL927129
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952106
3      7211863  7211863-109  18TWL952105
3      7211863  7211863-109  18TWL951103

Upvotes: 3

Related Questions