Reputation: 27
I have an issue with a table called "movies". I found the date and the movie title are both in the title column. As shown in the picture:
I don't know how to deal with this kind of issues. So, I tried to play with this code to make it similar to MySQL codes but I didn't work anyways.
DataFrame(row.str.split(' ',-1).tolist(),columns = ['title','date'])
How do I split it in two columns (title, date)?
Upvotes: 1
Views: 153
Reputation: 1269773
I would simply do:
select left(title, length(title) - 7) as title,
replace(right(title, 5) ,')', '') as year
Regular expressions seem like overkill for this logic.
In Hive, you need to use substr()
for this:
select substr(title, 1, length(title) - 7) as title,
substr(title, length(title) - 5, 4) as year
Upvotes: 1
Reputation: 27
After struggling and searching I was able to build this command which works perfectly.
select
translate(substr(title,0,length(title) -6) ,'', '') as title,
translate(substr(title, -5) ,')', '') as date
from movies;
Thanks for the people who answered too!
Upvotes: 0
Reputation: 521249
If you are using MySQL 8+, then we can try using REGEXP_REPLACE
:
SELECT
REGEXP_REPLACE(title, '^(.*)\\s\\(.*$', '$1') AS title,
REGEXP_REPLACE(title, '^.*\\s\\((\\d+)\\)$', '$1') AS date
FROM yourTable;
Here is a general regex pattern which can match your title strings:
^.*\s\((\d+)\)$
Explanation:
^ from the start of the string
(.*)\s match and capture anything, up to the last space
\( match a literal opening parenthesis
(\d+) match and capture the year (any number of digits)
\) match a literal closing parenthesis
$ end of string
Upvotes: 3