Mohammad Alamri
Mohammad Alamri

Reputation: 27

How to split a column in two columns

I have an issue with a table called "movies". I found the date and the movie title are both in the title column. As shown in the picture:

Sample

I don't know how to deal with this kind of issues. So, I tried to play with this code to make it similar to MySQL codes but I didn't work anyways.

DataFrame(row.str.split(' ',-1).tolist(),columns = ['title','date'])

How do I split it in two columns (title, date)?

Upvotes: 1

Views: 153

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269773

I would simply do:

select left(title, length(title) - 7) as title,
       replace(right(title, 5) ,')', '') as year

Regular expressions seem like overkill for this logic.

In Hive, you need to use substr() for this:

select substr(title, 1, length(title) - 7) as title,
       substr(title, length(title) - 5, 4) as year

Upvotes: 1

Mohammad Alamri
Mohammad Alamri

Reputation: 27

After struggling and searching I was able to build this command which works perfectly.

select
   translate(substr(title,0,length(title) -6) ,'', '') as title,
   translate(substr(title, -5) ,')', '') as date
from movies;

Thanks for the people who answered too!

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521249

If you are using MySQL 8+, then we can try using REGEXP_REPLACE:

SELECT
    REGEXP_REPLACE(title, '^(.*)\\s\\(.*$', '$1') AS title,
    REGEXP_REPLACE(title, '^.*\\s\\((\\d+)\\)$', '$1') AS date
FROM yourTable;

Demo

Here is a general regex pattern which can match your title strings:

^.*\s\((\d+)\)$

Explanation:

^            from the start of the string
    (.*)\s   match and capture anything, up to the last space
    \(       match a literal opening parenthesis
    (\d+)    match and capture the year (any number of digits)
    \)       match a literal closing parenthesis
$            end of string

Upvotes: 3

Related Questions