M Arroyo
M Arroyo

Reputation: 445

Splitting lists into columns

I have a pandas df:

  name    time
1  a      1 year 2 months
2  b      4 years 1 month
3  c      3 years 1 month

I want to end with:

  name    years   months
1  a      1       2
2  b      4       1
3  c      3       1

I can get as far as:

  name    time
1  a      [1, 2]
2  b      [4, 1]
3  c      [3, 1]

but I can't figure out how to split the lists into columns.

Upvotes: 2

Views: 1116

Answers (2)

Alexander
Alexander

Reputation: 109526

df = pd.DataFrame({'name': ['a', 'b', 'c'], 
                   'time': ['1 year 2 months', '4 years 1 month', '3 years 1 month']})

# Split the time column and take the first and third elements to extract the values.
df[['years', 'months']] = df.time.str.split(expand=True).iloc[:, [0, 2]].astype(int)

>>> df
   name             time  years months
0     a  1 year 2 months      1      2
1     b  4 years 1 month      4      1
2     c  3 years 1 month      3      1

You can use del df['time'] when you're ready to drop that column.

Upvotes: 4

Anton Protopopov
Anton Protopopov

Reputation: 31662

You could use str.findall to find digits in your time columns and then with str.join and str.split you could get your result:

In [240]: df.time.str.findall('\d').str.join('_').str.split('_', expand=True)
Out[240]:
   0  1
0  1  2
1  4  1
2  3  1

df[['years', 'months']] = df.time.str.findall('\d').str.join('_').str.split('_', expand=True)

In [245]: df
Out[245]:
  name             time years months
0    a  1 year 2 months     1      2
1    b  4 years 1 month     4      1
2    c  3 years 1 month     3      1

It's a bit faster then @Alexander's solution, and I think more general. From timing:

In [6]: %timeit df.time.str.split(expand=True).iloc[:, [0, 2]]
1000 loops, best of 3: 1.6 ms per loop

In [8]: %timeit df.time.str.findall('\d').str.join('_').str.split('_', expand=True)
1000 loops, best of 3: 1.43 ms per loop

Upvotes: 2

Related Questions