joelime
joelime

Reputation: 35

Is there a way to split row values into separate columns with Pandas?

I currently have a pandas data frame that reads like this:

0   (dev_id='A', accon_time='B', start_time='C',end_time='D')
1   (dev_id='E', accon_time='F', start_time='G',end_time='H')
2   (dev_id='I', accon_time='J', start_time='K',end_time='L')

The current shape of this data frame is (574,1) when I actually want it to be (574,4) where each of the 4 comma separated values in each row are actually split between 4 separate columns.

Is there any way to do this?

I've tried converting my query into a pandas series first and then using Series.str.split however the results are the same as the original data frame.

ser = pd.Series(qry)
ser.str.rsplit(pat=",", n=4, expand=True)
print(ser)
df = pd.DataFrame(data=ser)
print(df)

This is what i'm using to query my data:

class Trip(Base):
    __tablename__ = 'trip'
    dev_id = Column(String(50), primary_key=True)
    accon_time = Column(Integer)
    start_time = Column(Integer)
    end_time = Column(Integer)

    def __repr__(self):
        return "(dev_id='%s', accon_time='%s', start_time='%s',end_time='%s')" 
          % (self.dev_id, self.accon_time, self.start_time, self.end_time)

qry = session.query(Trip).\
        filter(Trip.accon_time.between(20190620000000, 20190621000000)).\
        filter(Trip.start_time <= 20190620145813).\
        filter(Trip.end_time <= 20190620151400).\
        filter(Trip.end_time >= 20190620145600)

This returns a list like this:

(dev_id='A', accon_time='B', start_time='C',end_time='D'),(dev_id='E', accon_time='F', start_time='G',end_time='H'),(dev_id='I', accon_time='J', start_time='K',end_time='L')

Converting my query results into a pandas data frame

df = pd.DataFrame(data=qry)
print(df)

Upvotes: 0

Views: 208

Answers (1)

kkawabat
kkawabat

Reputation: 1677

In your parsing example ser.str.rsplit(pat=",", n=4, expand=True) returns the output of ser, you need to catch the output or it wont do anything

Try this for the parsing:

qry =   ["(dev_id='A', accon_time='B', start_time='C',end_time='D')",
"(dev_id='E', accon_time='F', start_time='G',end_time='H')",
"(dev_id='I', accon_time='J', start_time='K',end_time='L')"]
ser = pd.Series(qry)
df = ser.apply(lambda x: pd.Series([val.split('=')[1] for val in x[1:-1].split(',')]))
df.columns = ['dev_id', 'accon_time', 'start_time', 'end_time']

for each row of ser .appy() I take the string and remove the paranteses x[1:-1] then split on the commas .split(',') this will give me a list of key value literals (i.e. ["dev_id='A'", " accon_time='B'", " start_time='C'", "end_time='D'"]). Then for each literal I split it on '=' and return the second element which is the actual value .split('=')[1].

if you don't want the "'" in your elements strip it at the end with .strip('\'')

   ser = ser.apply(lambda x:[val.split('=')[1].strip('\'') for val in x[1:-1].split(',')])

output:

  dev_id accon_time start_time end_time
0    'A'        'B'        'C'      'D'
1    'E'        'F'        'G'      'H'
2    'I'        'J'        'K'      'L'

Upvotes: 1

Related Questions