Use datafiles in Python

1. Reading datafile

By executing the following script, we can read all data in the JSON file.

# Read libraries
import json
import pandas as pd

# Load a JSON file and store in a dictionary

# Create pandas tables (DataFrames) from the dictionary

Let’s see the contents of each tables:

3.1.1. rawdata

The rawdata table contains each data points. For each datapoint (x, y), the original paper, figure,  sample, and the physical properties of the x and y axes are identified by the IDs. The values are expressed in the default units, defined in the property table.

スクリーンショット 2018-07-28 11.38.57

1.2. property

The property table shows the information about the physical properties that appear in this datafile. It also shows the default units (SI unit without prefixes) of the properties.

スクリーンショット 2018-07-28 11.38.51

1.3. paper

The paper table contains bibliographic information of each paper. The paperid is identical to the Starrydata ID (SID) in Starrydata web system.

スクリーンショット 2018-07-28 11.38.28

1.4. figure

The figure table contains the information about the original figure for the data. The paperid indicates the original paper, and the figurename shows the name of the figure in the original paper.

スクリーンショット 2018-07-28 11.38.39

Each figure is meant to have only one set of axes (a pair of propertyid_x and properytid_y). When one figure contains multiple sets of axes, they are given different figureids with the same figurename.

1.5. sample

The sample table contains the information of the material samples introduced in the paper. Usually, a sample appears in several figures. Samplename shows the name of the sample in the paper. Composition shows the possible chemical composition of the sample, presumed by the data collector by reading the text. When both the starting composition and the analytical composition are available, starting composition were preferred. The composition are shown in molar ratio.

スクリーンショット 2018-07-28 11.38.45

2. Creating original tables

By using the features of pandas, you can make original tables from the above tables, like in relational databases.

2.1. Extract columns

You may feel that our paper table is too big to handle. In such case, you can specify the columns that you need, when creating the DataFrame. You can also use this technique to reorder the columns in the table.

スクリーンショット 2018-07-28 12.28.17

2.2. Merge tables

You can include the information in the other tables, by merging two tables. For example, you can include the information of the paper table in the sample table, by using paperid as the key to merge the two tables. Here, for each row in the sample table, the program looked up the paper table to pick the information about each paperid.

スクリーンショット 2018-07-28 13.21.27

If you don’t want every column in the merged table, you can define a new DataFrame from the merged table and pick the necessary columns, as following.

スクリーンショット 2018-07-28 13.21.35

3. Extract datafiles from the rawdata

The rawdata table is a collection of all datasets in the list of paper. However, in usual chart-plotting softwares, we need a text file for each dataset.

3.1. Extract a datafile

You can pick up rawdata that have specific figureid and sampleid, as follows.

スクリーンショット 2018-07-28 14.38.57

To save this x and y in a text file, output a text file as follows.

スクリーンショット 2018-07-28 14.56.24

This outputs a text file as below.

スクリーンショット 2018-07-28 14.52.36

3.3.2. Extract all datafiles

Here we show an example script to output CSV files for every dataset on temperature dependence of Seebeck coefficient, from the datafile. The output files are named like 18560__Fig4__Ba8Al16Ga2Si26P2__S-T.csv, by using paperid, figurename, and sample composition, and stored in the directory data/clathrate.

スクリーンショット 2018-07-29 22.58.22


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s