AdrianC
AdrianC

Reputation: 393

Pandas Split Function in Reverse

I have a Pandas Dataframe with a column that looks like this:

    Car_Make
0   2017 Abarth 124 Spider ManualConvertible
1   2017 Abarth 124 Spider AutoConvertible
2   2017 Abarth 124 Spider ManualConvertible
3   2017 Abarth 124 Spider AutoConvertible
4   2017 Abarth 595 ManualHatch
5   2017 Abarth 595 AutoHatch

Three Questions:

1 How to save split data in panda in reverse order? - This solves my problem but I don't know how or why it works - can someone please explain this to me? I hate copy pasting without understanding why it works

df['Car_Make'].apply(lambda x:pd.Series(x.split()[::-1])) 

2 I have tried to replicate it using a user-defined function (that I can use again) with the following, but it doesn't appear to work (any help understanding why and the correct way to turn the Lambda function into a user-defined function

def f(x):
    df[x] = pd.Series(x.split()[::-1])
    return df

3 Is there a better way to split this column by space in reverse?

I have tried using Regex which works, but not on all rows as you can see row 4 and 5 a slightly different to the above.

Any help would be greatly appreciated.

Thanks, Adrian

Upvotes: 2

Views: 5764

Answers (3)

r.ook
r.ook

Reputation: 13888

The code you're asking here:

df['Car_Make'].apply(lambda x:pd.Series(x.split()[::-1]))

There are several things going on here:

1.) First, lambda are basically impromptu functions. In this case, it's an unnamed function taking the argument x, and returns pd.Series(x.split()[::-1]. More on x later.

2.) pd.Series(...) as you know creates a pandas Series object much like your original data.

3.) x.split() is splitting the string x with space as a separator by default.

4.) The [::-1] bit is a slice.. Much like range(), it takes 3 params, [start: end: steps]. In this case, it's saying to get the string from start to end, but use -1 as steps, i.e. in reverse. Note that only the end param is mandatory.

5.) The main function here is apply() on your df['Car_Make'] series, which is essentially a list of strings. apply() takes a function (much like map()) and apply it to the df['Car_Make'] series. In this case, it's applying the lambda, which takes the data of your series and use it as argument x for the function.

6.) Putting everything back together. The statement is:

  • passing the df['Car_Make'] string data as x to the lambda
  • lambda then process the x.split() to split the string data into list.
  • The list is then sorted in reverse order by the slice [::-1].
  • pd.Series() now convert the list into a Series object.
  • The Series object is then returned by lambda to your apply() function.
  • The apply() function then return the resulting Series object, which conveniently, is the reverse sorted string you wanted in a Series.

If all you care about is the very last split though, you really don't need to do the reverse split and all that. You could easily have done the following and it would have returned the very last item in the split right away:

data['Car Make'].apply(lambda x: pd.Series({'Car_Make':x.split()[-1]}))

            Car_Make
0  ManualConvertible
1    AutoConvertible
2  ManualConvertible
3    AutoConvertible
4        ManualHatch
5          AutoHatch

Thank you for asking this question, I learned a few stuff about pandas during this answer as well.

Upvotes: 1

jack6e
jack6e

Reputation: 1522

Here's a shot at your three questions:

1) Why does df['Car_Make'].apply(lambda x:pd.Series(x.split()[::-1])) work?

Break it down:

  1. df['Car_Make'] - the column with the data you want to operate on
  2. .apply() - a pandas DataFrame and Series method that will apply a function to either every column, or every row, in a DataFrame, or to every row in a Series.
  3. lambda x: - the function that will be applied by the .apply() method to every row of the Series. x represents the record object, which in your case is the string containing the Car_Make entries.
  4. pd.Series() - this will convert the value inside it into a pandas Series.
  5. x.split() - As mentioned in point 3, x is your string object, and split() is a string method that, when passed with no parameters, defaults to splitting a string by its spaces and returning each split object into a list.
  6. [::-1] - A handy list iterator that reverses a list, such as that returned by x.split(). The syntax for list iteration is [start_index:end_index:step]. Using a -1 step iterates through the list backwards.

Put that all together, and that code is iterating through every record in df['Car_Make'], splitting them, reversing the order of the split items, and returning the reversed list as a pandas Series object.

2) Replicating that with a defined function.

You are really close, only that the function needs to take a row/record as its argument, and needs to be called in the .apply() method. What you want to do is replace the lambda x, not the way it is applied.

Using what you have so far:

def f(x):
    return pd.Series(x.split()[::-1])

df['Car_Make'].apply(f)

3) Is there a better way?

If you want to split a string and then reverse the order of the items, no, this is a great way. If you only want to split a certain part of a string starting from the right, then rsplit() is a good method.

Upvotes: 7

James
James

Reputation: 36721

Is this what you are looking for:

df.car_make.str.rsplit(' ', 1, expand=True)
# returns:
                        0                  1
0  2017 Abarth 124 Spider  ManualConvertible
1  2017 Abarth 124 Spider    AutoConvertible
2  2017 Abarth 124 Spider  ManualConvertible
3  2017 Abarth 124 Spider    AutoConvertible
4         2017 Abarth 595        ManualHatch
5         2017 Abarth 595          AutoHatch

Upvotes: 4

Related Questions