Praveen Kumar-M
Praveen Kumar-M

Reputation: 243

Best practices on importing data posted in Stackoverflow into working environment in Python

Suppose a question is asked in Stackoverflow and example data is given, what is the best way to import the example data

In order to explain this question which I am asking, let me take to a real StackOverflow post.

This post has given the following data to solve the query.


    Access Type     host/IP address Count
0   Authorized      206.196.21.129  23
1   Authorized      207.30.238.8    46
2   Authorized      208.62.55.75    23
3   Authorized      216.12.111.241  23
4   Authorized      63.197.98.106   23
5   Authorized      67.95.49.172    23
6   Unauthorized    207.243.167.114 23
7   Unauthorized    209.152.168.249 10
8   Unauthorized    65.166.159.14   10
9   Unauthorized    68.143.156.89   10

Now what will be the best way to import the data into Python

The methodology should be in such a way that the import is done into the python environment in a single step.

I am aware that, I can copy-paste the data into a spreadsheet -> then save it as CSV file and then import it into Python. But I need a more elegant way.

Other solution as suggested by one developer,

import pandas as pd ;

df = pd.read_clipboard()

pd.read_clipboard(), is not a universal solution across all operating system. My system (Ubuntu 19.10), I did a clipboard copying and then the result came like this. The column names were wrongly imported.

enter image description here

This post deals with R solution, but I am interested in a solution for Python

Upvotes: 1

Views: 300

Answers (3)

mechanical_meat
mechanical_meat

Reputation: 169284

Edited answer: this is slightly more work, but it's another viable option in my opinion.

from io import StringIO
import pandas as pd

s = StringIO("""    Access Type     host/IP address Count
0   Authorized      206.196.21.129  23
1   Authorized      207.30.238.8    46
2   Authorized      208.62.55.75    23
3   Authorized      216.12.111.241  23
4   Authorized      63.197.98.106   23
5   Authorized      67.95.49.172    23
6   Unauthorized    207.243.167.114 23
7   Unauthorized    209.152.168.249 10
8   Unauthorized    65.166.159.14   10
9   Unauthorized    68.143.156.89   10""")

df = pd.read_fwf(s,index_col=0,infer_nrows=10)

Result:

In [175]: df
Out[175]: 
    Access Type  host/IP address  Count
0    Authorized   206.196.21.129     23
1    Authorized     207.30.238.8     46
2    Authorized     208.62.55.75     23
3    Authorized   216.12.111.241     23
4    Authorized    63.197.98.106     23
5    Authorized     67.95.49.172     23
6  Unauthorized  207.243.167.114     23
7  Unauthorized  209.152.168.249     10
8  Unauthorized    65.166.159.14     10
9  Unauthorized    68.143.156.89     10

Upvotes: 1

Hossein
Hossein

Reputation: 25924

As others have pointed out, The best solution seems to be pd.read_clipboard() In order not to get messed up columns, just replace the spaces in the column names with _ and then to get the actual columns simply do:

df.columns = [name.replace('_',' ') for name in df.columns.values]

So copy the input when you replaced spaces with _ : input:

    Access_Type     host/IP_address Count
0   Authorized      206.196.21.129  23
1   Authorized      207.30.238.8    46
2   Authorized      208.62.55.75    23
3   Authorized      216.12.111.241  23
4   Authorized      63.197.98.106   23
5   Authorized      67.95.49.172    23
6   Unauthorized    207.243.167.114 23
7   Unauthorized    209.152.168.249 10
8   Unauthorized    65.166.159.14   10
9   Unauthorized    68.143.156.89   10

run the code:

import pandas as pd 
df = pd.read_clipboard()
df.columns = [name.replace('_',' ') for name in df.columns.values]

and you'll get:

Output:

    Access Type  host/IP address  Count
0    Authorized   206.196.21.129     23
1    Authorized     207.30.238.8     46
2    Authorized     208.62.55.75     23
3    Authorized   216.12.111.241     23
4    Authorized    63.197.98.106     23
5    Authorized     67.95.49.172     23
6  Unauthorized  207.243.167.114     23
7  Unauthorized  209.152.168.249     10
8  Unauthorized    65.166.159.14     10
9  Unauthorized    68.143.156.89     10

Upvotes: 1

Yanirmr
Yanirmr

Reputation: 1032

Try to use pandas function read_clipboard()

This method is as simple as you want: it reads copy-pasted tabular data and parses it into a Data Frame.

By the documentation this method read text from clipboard and pass to read_csv.

You can find more information in this blog post

EDIT: I think that your problem didn't cause by your OS, it could be because of the white spaces in the first line of the data.

Upvotes: 0

Related Questions