Prince Hermit
Prince Hermit

Reputation: 37

Getting list of address by flattening row in Dataframe pandas with a delimiter

I have created a dataframe from an excel file. The table is a contact list with one row contain name of location, next row street address, next city and repeat for each location. Now each location is seperate by a specific phrase 'Site code'. Below is an example of what I see.

Name and address Don't Care
Location Name x
Street addresss x
City x
State x
Site code x
Location Name x
Street addresss x
City x
State x
Site code x

I want to be able to flatten everything inbetween the site code into one row so it would be

[location name, street address, city, state]

I was thinking of making a function that would look at the table and create a dictionary that key would be the location name and it appends everything until it reaches to site code then skips and moves on the next dictionary entry. But it also feels like I am overthinking this. What do you think?

Upvotes: 1

Views: 75

Answers (2)

mozway
mozway

Reputation: 261850

If you always have the same number of fields, you can use numpy to reshape the data:

pd.DataFrame(df['Name and address'].values.reshape((-1,5))[:, :4]).apply(list, axis=1)

How this works:

(pd.DataFrame(df['Name and address']  # get relevant column
                .values               # access the underlying numpy array
                .reshape((-1,5))      # reshape: new row every 5 elements
                [:, :4]               # drop the 5th column ("Site code")
             ).apply(list, axis=1)    # make as list again
)

output:

0    [Location Name , Street addresss , City , State ]
1    [Location Name , Street addresss , City , State ]
dtype: object

Upvotes: 0

not_speshal
not_speshal

Reputation: 23156

You could use numpy.array_split to split the DataFrame on rows that contain "Site code", and then parse the resultant chunks as needed:

import numpy as np
chunks = np.array_split(df["Name and address"], df[df["Name and address"]=="Site code"].index)
output = [chunk[chunk!="Site code"].tolist() for chunk in chunks if chunk.tolist()!=["Site code"]]

>>> output
[['Location Name', 'Street addresss', 'City', 'State'],
 ['Location Name', 'Street addresss', 'City', 'State']]

Upvotes: 2

Related Questions