user122244
user122244

Reputation: 119

Extract numbers by position in Pandas?

I have a df:

                  col1
0       01139290201001
1       01139290101001
2       01139290201002
3       01139290101002
4       01139290201003
5       01139290101003
6       01139290201004
7       01139310101001
8       01139290201005
9       01139290301001
            ...      
5908      01139ÅÊ21020
5909      01139ÅÊ21013
5910      01139ÅÊ11008
5911      01139ÅÊ21011
5912      01139ÅÊ03003

and I need to extract to a new column the first 7 numbers in the int only cases and the first 5 and 8,9 numbers in the cases where characters are included.

I tried this code to a made up dataframe to try out ways to solve it and it worked but when I tried it on the actual dataset it didn't work as expected with the main reason being that my actual df has integers and it did calculations on them.

df['col2']=df[col1][0:5]+df['col1'][8]


0       0113929020100101139290201005
1       0113929010100101139290201005
2       0113929020100201139290201005
3       0113929010100201139290201005
4       0113929020100301139290201005
5                                NaN
6                                NaN
7                                NaN
8                                NaN
9                                NaN

also why it causes NaN values?

i want it to look like this:

 01139290201001 to 0113929 for integer only rows and like this for the others
 01139ÅÊ03003 to 0113903

Upvotes: 1

Views: 226

Answers (1)

Rakesh
Rakesh

Reputation: 82785

Using .apply

Ex:

import pandas as pd
df = pd.DataFrame({"col1": ["01139290201001", "01139290101001", "01139290201002", "01139ÅÊ21020", "01139ÅÊ21013", "01139ÅÊ11008"]})
df["col2"] = df["col1"].apply(lambda x: x[:7] if x.isdigit() else x[:5]+x[9:11] )
print(df)

Output:

             col1     col2
0  01139290201001  0113929
1  01139290101001  0113929
2  01139290201002  0113929
3    01139ÅÊ21020  0113921
4    01139ÅÊ21013  0113921
5    01139ÅÊ11008  0113911

Upvotes: 3

Related Questions