Starrydata (Starrydata2) is an open web system to help material researchers to collect and share digital data from plot images in published papers. It includes not only the top data, but many other data from reference samples with not good properties.

Every datum is a star.

Not only bright stars, but also the dwarf stars are important. We want a database that contains many data like a starry sky. A group of data like Milky Way tells many things about the universe.

Data from Plot images

Most of the past scientific data are not accessible, because they are buried in plot images. We recover the data by plot digitization. We hope that in future, every data in published papers become accessible in digital form.  

manual data collection

We need the knowledge of the specialists who understand why authors wrote in that way. Therefore, we let researchers collect data by themselves. By sharing each researcher’s effort, we can create a large database.

a reference magager with data

To make Starrydata as a part of researchers’ lives, we designed this system like a reference manager, in which they can make their own lists of favorite papers. Users can download all the digitized data, associated with the papers in the list. The number of data increases when they find data of their interest are missing. They can fill the data by themselves, and share it with the other researchers.  

Interactive data viewer

Starrydata helps researchers to visualize the data. Line plots, heat maps, single and multiple scatter plots can be drawn. Mouse-hovering shows the information about each data point.

open (Free) database

Starrydata2 is free. Both academic and commercial users can use our data for free. This is because our electronic journals do not allow commercial uses of the papers. This just means that we do not earn money by selling the data obtained from papers. Instead of payment, we will ask users for citation of our paper on Starrydata2, which has been submitted and waiting for acceptance.

We are financially supported by various research projects, which use large-scale experimental data. The research budgets can be used to employ research assistants, who intensively collect data on the research project.

Access Starrydata2

Panel 1

Research themes

We started Starrydata as a database of experimental thermoelectric properties. Therefore, many of the initial datasets are on thermoelectric materials. However, we would like to let Starrydata be used in many more research fields. Although we designed Starrydata to be useful for material scientists (especially for inorganic materials), it should be applicable for any research fields, in which plot data in papers take important roles.


When research fields are different, the papers to search from may be different. Therefore, we separated the entrance of the database, by research fields. Each database have different lists of papers. Users can generate different paper lists for each research field. (The paper lists are private, so they are not shared to other users.) If the user have one or more paper lists in the research field, the database is highlighted in the top page. If a paper belongs to multiple databases, the data associated to the paper are shared over the different databases. The ‘General’ database contain all the papers, across all the research field. If you don’t have a specific research field, this may be the best database to make your own paper list.

We are waiting for researchers of the other fields, to start data collection using our system. Material scientists (magnetic materials, strongly correlated electron systems, superconductors, ferroelectrics, catalysts, steels, light-emitting materials, solar cells, polymers etc.) are welcome. Other scientists (physics, chemistry, geology, biology etc.) are also welcome. We are ready to start a database of research fields, once we get contact from the contact page.

List of databases (research fields)

1. ThermoelectricMaterials

Thermoelectric materials can convert temperature difference and electricity. They can be used to cool something by flowing electric current, and for small-scale power generation from temperature difference.
Most of thermoelectric materials are inorganic compounds. There are also some organic thermoelectric materials. They are characterized as heavily-doped semiconductors with low thermal conductivity.
Thermoelectric properties have strong sample-dependence. They change drastically with changing carrier density. Good thermoelectric properties often evolve in dirty samples rather than in clean samples. These make the prediction of the good parent compound very difficult.

Panel 2


Project leader / System design:

Yukari Katsura

The University of Tokyo / National Institute of Materials Science, Japan

Main programmer / System design:

Masaya Kumagai

RIKEN / Sakura Internet, Inc.


Panel 3 Placeholder
Panel 4


This is the page where users will find your site’s blog

Use datafiles in Python

1. Reading datafile By executing the following script, we can read all data in the JSON file. # Read libraries import json import pandas as pd # Load a JSON file and store in a dictionary f=open(‘JSON_RDB_test.json’,’r’) dict_all=json.load(f) # Create pandas tables (DataFrames) from the dictionary df_rawdata=pd.DataFrame(dict_all[“rawdata”]) df_paper=pd.DataFrame(dict_all[“paper”]) df_figure=pd.DataFrame(dict_all[“figure”]) df_sample=pd.DataFrame(dict_all[“sample”]) df_property=pd.DataFrame(dict_all[“property”]) Let’s see the contents … Continue reading Use datafiles in Python

Starrydata API

We have implemented APIs for Starrydata web system. You can get all data associated to specific paper/figure/sample by the following URIs. JSON data for a specific paperid(=’sid’)/figureid/sampleid A specific element of an entry A list of paperids/figureids/sampleids that contain specific atoms in the sample compositions,Te,Te,Te,Te  (for … Continue reading Starrydata API