Reputation: 1391
I have some data that I'm trying to clean up. That involves modifying some columns, combining other cols into new ones, etc. I am wondering if there is a way to do this in a succinct way in pandas or if each operation needs to be a separate line of code. Here is an example:
ex_df = pd.DataFrame(data = {"a": [1,2,3,4], "b": ["a-b", "c-d", "e-f", "g-h"]})
Say I want to create a new column called c
which will be the first letter in each row of b
, I want to transform b
by removing the "-", and I want to create another col called d
which will be the first letter of b
concatenated with the entry in a
in that same row. Right now I would have to do something like this:
ex_df["b"] = ex_df["b"].map(lambda x: "".join(x.split(sep="-")))
ex_df["c"] = ex_df["b"].map(lambda x: x[0])
ex_df["d"] = ex_df.apply(func=lambda s: s["c"] + str(s["a"]), axis=1)
ex_df
# a b c d
#0 1 ab a a1
#1 2 cd c c2
#2 3 ef e e3
#3 4 gh g g4
Coming from an R data.table background (which would combine all these operations into a single statement), I'm wondering how things are done in pandas.
Upvotes: 0
Views: 53
Reputation: 82795
This is one approach.
Demo:
import pandas as pd
ex_df = pd.DataFrame(data = {"a": [1,2,3,4], "b": ["a-b", "c-d", "e-f", "g-h"]})
ex_df["c"] = ex_df["b"].str[0]
ex_df["b"] = ex_df["b"].str.replace("-", "")
ex_df["d"] = ex_df.apply(lambda s: s["c"] + str(s["a"])), axis=1)
print(ex_df)
Output:
a b c d
0 1 ab a a1
1 2 cd c c2
2 3 ef e e3
3 4 gh g g4
You can use the build in str
method to make the required output.
Upvotes: 0
Reputation: 16434
You can use:
In [12]: ex_df.assign(
...: b=ex_df.b.str.replace('-', ''),
...: c=ex_df.b.str[0],
...: d=ex_df.b.str[0] + ex_df.a.astype(str)
...: )
Out[12]:
a b c d
0 1 ab a a1
1 2 cd c c2
2 3 ef e e3
3 4 gh g g4
Upvotes: 1