How to Sync Files From My Laptop to Remote Server

Context

I received phone call from one person who with his frightened voice said: Dermatologist!. I wanted to say kindly that that person probably has got wrong number because I am not dermatologist and I cannot give any medical advise about skin issues except to suggest to someone to visit dermatologist. But, that person said: I am dermatologist. I have very big problem. I have to transfer a huge number of files from one computer to another. I heard that you can help me. I have got more than fifteen thousands of images, pdf files and I have to transfer them to some server. I have been told that I will receive ssh details. What is that?

We met each other and in our conversation I realized that a huge repository of medical images, scientific texts and program files has to be migrated to new server which will be used by scientists to share scientific research. Dermatologist has got laptop with GNU/Linux and he has no previous experience in working with procedures and software in order to accomplish his task.

It is doable, I said. But, you have to be precise in typing commands and respect order of steps what is needed to do. Focus, concentration and accuracy are keywords, I emphasized. I hope I will be able to do that, he said.

First step – SSH

SSH is a secure communication protocol used to connect to remote GNU/Linux server. SSH stands for Secure SHell. Many people who are not used to work with servers expect that hey will see some complicated graphical interface. There is no complicated graphical interface, there is textual interface without buttons, dropdown menus, various radio and other boxes. Is that even more complicated? No. But, if you are dermatologist or other professional not familiar with IT you can ask someone to help you or just be happy to learn very small number of commands. Just follow and remember this guide and you do not need to learn anything complicate in order to accomplish task of copying a large amount of various files from one server to another that someone gave to you.

Our dermatologist has got those files copied to the laptop which he has to use to transfer files to other server. Since he received information that remote server was set up with SSH enabled he has to learn how to connect to it.
In order to connect to remote server he turned on his laptop and started terminal application that is terminal with text interface that he needs to start SSH session.

Terminal application window usually looks like this:

Terminal application window

Terminal application window will prompt you with username and computer name. No buttons, no complicated menus. Only prompt for commands. (Users that use screen readers can easily in textual interface perform all activities required.) However, the most important is that you understand logic what you want to do and to express it in some command. Once you get right commands you can literally copy them to the prompt.

Why Secure Shell? Why not some nice graphical user interface with my password? Security is not only feature. It must be principle. So, how SSH works? Using passwords only can be vulnerable method since many malicious bots and guys with powerful hardware can use various techniques to steel passwords from you. SSH is when security is concerned still better method to be used. Actually, SSH use so called SSH keys which are cryptographic matching pair of keys. One is private key and one is public key. Public key can be shared and exposed to others (do not do that anyway) but private key must not be exposed to anyone. The private key and processes of encryption and decryption during the establishment of connection are essential for security of SSH connection.

Firstly, we have to generate a pair of keys using command:

ssh-keygen

When we issue that command in our prompt the system will ask us to name the file in which key will be saved, and after that we will be asked to enter passphrase. Please type passphrase that you can remember. Your screen will look similar to the screen as on image below:

Outputs of ssh-keygen command

In the latest versions of GNU/Linux distributions ssh is usually configured by default to generate keys with high level security. In our case default values say that we have got RSA keys with 2048 bits.

After we generated the keys we should transfer public key to remote server by issuing command:

ssh-copy-id username@remote_host

Usually, administrator of that server or hosting company that set it up gave you the username and password while remote_host can be some address: medicaljournal.org or some IP number which looks like 193.243.27.183 or so. (This IP number is unknown to me so please do not use it, I typed just to show people without IT background how it can look like.)

After that you can connect to your serve by issuing command:

ssh -p 2020 username@remote_host

Please note that we use option that SSH is open for connection on port 2020 because sometimes hosting companies use that port instead of default 22. If your hosting company use port 22 you do not need to write “- p 2020”. If they use other port you can use that port and it will be “-p numberofport”. After issuing command we will have on prompt something like:

The authenticity of host '[name or ip number of your host will be here]:numberofport ([name or number of your host]:numberofport)' can't be established.
ECDSA key fingerprint is SHA256:y9aVJtMpIZusjf3bmSEtWg/9RwjTrCbAT0Tli9pvLmM.
Are you sure you want to continue connecting (yes/no)?

When you type “yes” and press Enter it will ask you password of that server. If you are scared you can type “logout” and press Enter and system will log you out. So far so good. Nothing exploded, you are safe and you have done wonderful work which you have to do only once.

Finally copying. But, I have questions!

When I showed that to dermatologist he felt something between happiness due to some new discovery and a sort of stage fright. Should I be able to do that to the end properly?, was visible on his face. But, I have some questions, he said. I have been told that I have to do that with port 2020. Secondly, I will have always on my computer a number of new files how can I copy them on the server? One by one? Should I have some paper evidence of what I copied?

Well, I started, there is easy answer to your questions. We can combine commands rsync and ssh if you have many files. Firstly, you have to keep your file sin some folder on your computer and we should issue this command:

rsync -avz -e "ssh -p $portNumber" source destination

The system will first time copy all files from the computer that in folders contains all files and copy them on remote computer. After copying them firstly it will copy only new files in the second turn. (rsync stands for remote sync)

In the case of our dermatologist he has got folder /research-files which had a lot of subfolders with names of researchers and each has had subfolders /articles /statistics /measurements /photos and /diagrams. He wants that on remote server should be the same principle applied with names of folders and subfolders. On remote server in /home folder administrator created user researcher and the user’s folder named researcher. All files should be copied to that folder using port 2020 in ssh. Sounds complicated, but it is not. on his laptop he has got folder /home and user /dermatology in which all folders and files are copied.

The command will be like this:

rsync -avz -e "ssh -p 2020" /home/dermatology/research-files researcher@remote_host: /home/researcher/

Instead of remote_host type your host name or IP number. After issuing this command the system will ask you to type password of your user. After typing password the process of remote sync will begin. Due to flag v in “avz” which stands for “verbose” you will see on your screen whole process. Duration of the process will depend on your upload speed. You can send mails, open documents, play music on your computer. The first turn can last longer if you have many files. But, the second and other turns will transfer only difference which means files that are added after the first turn. That will probably be considerably shorter. That’s all. Not too hard.

Advertisement

Malware Intrusion

We know that there is no ideally secure server. I witnessed many times that hosting companies and their employees sometimes suffer from a lack of resources, equipment and skilled people that should take care on security of servers.  One of them tried to convince me that permission for folders in public_html should be 777. (If you are new to web applications and setting up your system for open access publishing please find on the internet information about permissions on your server. Majority of hosting companies with shared hosting accounts by default set that folders do have permissions set to 755 and files to 644.  Those people who want to compromise your server usually inject code that is planned to exploit vulnerabilities and use your server for some, usually illegal, operations as on image below.  When you in the process of choosing application, hosting company and person who will administer server the security should be top priority issue.

example of intrusion codeThere are various methods how to do that. Example on presented here was part of one larger file that was present on one server used to publish scientific journals.  Sometimes, servers are safe but applications installed are very vulnerable.  Strong competition and financial urges force developers to issue product as soon as they can without proper testing. I came across several times that some pieces of software are written for very obsolete and insecure versions of PHP which poses additional risks for security of site. On the other side, various additions of custom code that is not tested can make system insecure.

Such incidents can endanger your reputation and trust of authors, readers, reviewers and librarians that would like to visit your site often. Above all, sometimes some drivers, firmware, operating systems are vulnerable and you as user of one account cannot do anything to prevent that. That is job of people in hosting company and manufacturers of hardware with vulnerable software to fix vulnerable parts of software. Nevertheless, this should not discourage your from publishing open access.  Constructive and proactive caution is always necessary and welcome.

 

Once, I received call from one association that is publisher of one scientific journal. They informed me that some strange code appeared on their site and I used various malware testing tools and my result was like on image below. I found soon that server was infected so called db.php infection.  Since malware was successfully uploaded on server, it GET requests and it infects every javascript files (.js) with javascript malware code.encoded intrusion  I decoded strings displayed on page and I found IP address of server that is infected and which is used for distribution of malware and which redirects users to other sites. Since such code was all over the site it was very hard to read pages and visitors were prevented from using open access content.

I reported editorial board of the journal on my findings and we informed hosting company and domain registrar of domains used to spread malware asking them to check issue and undertake necessary measures to stop abuse of our and possibly other sites infected by that malware.

The process was rather tense, stressful and painful for editorial board and all people concerned.  The hosting company that hosted server with domain used for spreading malware informed us that they will take care on the case. 

We used other tools to block IPs that are detected as attackers. We have had that day more than 290 attacks from computers from Panama and more than 150 attacks from computers from Ukraine. We restored our site by using fresh backups and reinstallation of web applications we use.  Our hosting company upgraded PHP version that was obsolete, unsupported and insecure at  the time.

 

GDPR Compliance

I have experienced several times that various companies purchased databases of e-mail addresses and other information about persons who may be potential customers of their commodities. Those companies used that information to send them their mail campaigns. Sometimes they receive information about clients that are retired, young, athletes or sorted according to various criteria. Many people asked themselves how they do now that I am retired recently or that I my kids just enrolled in secondary school. Many people felt embarrassed and confused after they realized that their privacy is not protected and that their private information is distributed to third parties without their consent.

Search engines often allowed anyone to easily find information about people that are registered in any on line system.

Sometimes journal editors while entering archives of previous issues of their journals, articles and information about authors in web applications such as OJS are faced with repetitious work of entering information about some authors.  Some of them asked developers to develop plugin which will enable that will enable them to have drop down list of users so they can easily select user and insert it in list of authors of some scientific article. They did not have any intention of making public that list or to use that feature anywhere except in administration panel of their web applications. But, their benevolent intention can in some contexts produce unpleasant consequences for some authors.  Thus, it is needed that privacy is protected by design not just by possible honest intentions of people who use data about other people.

Numerous complaints in previous years motivated legislators in the EU to pass by very strict rule that will protect data about people. The EU adopted General Data Protection Regulation.

“The EU General Data Protection Regulation (GDPR) replaces the Data Protection Directive 95/46/EC and was designed to harmonize data privacy laws across Europe, to protect and empower all EU citizens data privacy and to reshape the way organizations across the region approach data privacy.”  It will have very strong impact on entities within EU and those which store and use information of the citizens of the countries that are the EU members. The EU General Data Protection Regulation (GDPR)  was approved by the EU Parliament on April 14, 2016 and enforcement day is May 25, 2018.  Organizations in non-compliance can face heavy fines.  It is important to read part on extra-territorial applicability which reads:

“Arguably the biggest change to the regulatory landscape of data privacy comes with the extended jurisdiction of the GDPR, as it applies to all companies processing the personal data of data subjects residing in the Union, regardless of the company’s location. Previously, territorial applicability of the directive was ambiguous and referred to data process ‘in context of an establishment’. This topic has arisen in a number of high profile court cases. GPDR makes its applicability very clear – it will apply to the processing of personal data by controllers and processors in the EU, regardless of whether the processing takes place in the EU or not. The GDPR will also apply to the processing of personal data of data subjects in the EU by a controller or processor not established in the EU, where the activities relate to: offering goods or services to EU citizens (irrespective of whether payment is required) and the monitoring of behaviour that takes place within the EU. Non-Eu businesses processing the data of EU citizens will also have to appoint a representative in the EU. ”

There is still time to be prepared and in order to do so properly please read the text of adopted text of The EU General Data Protection Regulation (GDPR).

 

Manuals for Open Journal Systems

I have found that many editorial boards struggle with a lack of concise instruction materials and a lack of people who can train them with hands-on approach.  They usually find some solutions in on-line forums, but it is time consuming for editorial boards to spend so much time and look for partial information. Sometimes people who write manuals do not explain each step.  Several people contacted me and asked: “What I have to do now?  Something is missing.”

System administrators, software developers assume that what is easy to them it should be easy to everyone.  They plan training to be done in one evening because “It is easy. ” In my experience, I often found out that such practice leads to misconfiguration of application, underuse of its features,  mistakes in performing workflow tasks and procedures.  Work with applications as the Open Journal Systems is not hard but it is complex and it takes some time until user is familiar with its functionality and simple procedures for configuration and efficient use.

I wrote manuals for authors, editorial boards and reviewers for scientific journals according to their needs.

You can find here manual for authors, editorial boards, and reviewers.

I will publish here soon manual that puts together some basic administrative and editorial functions aimed to successful configuration of your Open Journal Systems application for your journal.

 

A number of virtual machines on one server

I have been recently invited by high level officials of one institution to help them to publish several journals on line.  Indeed, I recommended them to use Open Journal Systems which they gladly accepted.

They showed me their server, but one of high officials said: “You know, we do not have anyone who knows Linux.  We heard that you know it. we have on our server several installations for different web platforms, but we do not have idea how to fix several small issues and how to make everything work smoothly.”  I looked at each of those GNU/Linux installations and realized that many of them are installed as desktop machines with some additional applications such as web server, PHP, MySQL etc. But, many of those installations were lacking several dependencies resulting that some modules in web applications did not work. People who installed them were elsewhere and they did not have any documentation on settings, active services, software package versions and other important information related to virtual machines with operating systems and web applications installed on them.

I suggested to them that is necessary somehow to standardize those installations and choose GNU/Linux distribution that is efficient and easy to administer and migrate web applications to newly installed virtual machines and create documentation with precise information on operating system version, versions of important applications and infrastructure requirements. Those specifications will help administrator to manage backups, upgrades, maintenance, testing instead of guessing where is which application  and with what other application is or is not compatible.  They were scared since they were not sure how to do that and how many work hours is needed for that. Well, that will save a number of work hours of saving damaged or corrupted data, misbehaving applications or consequences of compromised virtual machines due to software packages that were not upgraded when needed.

Thanking to experience and knowledge of free software developers and users of GNU/Linux and many other free software applications it is possible to plan, project and implement whole infrastructure and web applications in a way that can assure users and administrators that everything will work smooth.

It is needed to take care on:

  • scalability
  • security
  • easy of use
  • ease of quality administration
  • price
  • maintenance
  • documentation
  • backups

We can add more criteria and discuss all of them which is beyond scope of this post. But, I want to stress importance that free software, open access are not just sandbox for benevolence and good will.  It is rather, very serious activity and require a lot of work in order to make sure that users of information and knowledge we publish on web applications designed for open access publishing  will have positive experience that will help them to learn.

 

 

Licensing – Open Access

In my work with editorial boards of scholarly journals I found often that they support idea of open access in general. But, it is not clear always that licensing itself from the legal point of view may be quite complex.  Heads of scientific libraries and editorial boards sometimes discuss for long time issues related to licensing issues.  Sometimes that takes too much time since their lawyers sometimes say:  “That license gives you framework for implementation of open access ideas, but in our legislation it will be hard to make defense at the court. ” Well, it might be quite useful to have close cooperation with lawyers, NGOs and other people involved in the development of legislative efforts and translate Creative Commons license and do necessary steps so it can be accepted and accepted in legislation in your country.

In some countries people register their work in national copyright agencies, but absence of registration does not imply absence of protection and copyright.

One of successful and viable licensing practices is to choose appropriate Creative Commons licenses for article, data set, images or other article components.  Scientists who would publish source code of software used and created in research may use free software licenses. Please note that license does not relate to the content on images, video in terms of privacy and other potential legal issues.  For example, video showing a woman doing breast self-exam can be from the point of video authoring protected by Creative Commons. But if video shows face of the woman showed in video recording her privacy is violated if she had not given clear consent for that previously.

Editorial boards and librarians should often visit the website of EIFL.  They made very useful Handbook on Copyright and Related Issues for Libraries.  Those who would like to learn more on use of Creative Commons and what users can do with Creative Commons licenses please visit page with information on webinar related to that topic.  Knowledge acquired from those resources can help you to be more efficient, productive and safe in your publishing efforts. Your administrator can insert appropriate licensing information in published content.

The Open Journal Systems can insert licensing information in the article metadata  automatically and save your effort and time.

 

Administrator should take care on technical matters, I will do the science?

Information and communication technologies are very complex and it is not easy for everyone to master all aspects of on line publishing. That is true. Isn’t it!

Surely, there is some sort of division of labor requested too. But, lessons drawn from experience show that only intensive, open and productive communication and collaboration between administrator and editorial board will produce good result. The both work on a common task.  There is no search engine optimization or graphically appealing theme that should do work instead of academic rigor. in addition, if there is academic rigor and hard work and dedication of editorial board, the journal will not be visible if the system is often down. Even, much beyond that polarization of (a lack of ) success only mutual understanding of technology, editorial needs, plans, ambitions is something that creates ground for success.

We do not have server. How we can test on line publishing?

Many journals do not have resources to purchase web hosting plans. Before they make decision to move forward, check what is good for them, what to put in the project proposal to donors it is highly recommended to check web platform such as OJS on your local computer.

You can use one of popular prepacked and configured platforms such as XAMPP which will let you install OJS and check it how it works.  You can download OJS  and install it. There is a plenty of resources on using XAMPP on YouTube.

Install it and ask people on forum to help you to learn more about the system. I know people who have had positive experience with checking some platform locally before going fully on-line.  When I work with editorial boards I always suggest and help them to install it locally and then with better insight and proper planning to conceptualize their OA publishing policy.

People who planned on line use of ATutor and/or OJS in my experience benefited a lot from checking those applications locally.  Do not be shy, move forward and ask for help.  Asking is not shame. That is core of community support!

How to start open access publishing?

laptop with headphones

Many people asked me how to start open access publishing.  My answer was often like this: Let’s go! Do it!

Indeed, I still think the same. But, I would add to that answer:  The preparation is more than 70% of work.

Huh! What does that mean?, someone may ask.

Since I have been many times involved in fixing issues caused by bad or missing preparation I will list below important questions that you should take into consideration before you start. Well, that preparation will be more than start!  Time that you spend in preparatory activities is not lost if preparation is well done.  The points that I am listing below will help you to do that properly.

Do we have sufficient information on web platform that we will use for publishing?

Note: Web platforms for publishing are not “sites” with static pages. If you need site with a plenty of information maybe you should consider creating a separate site.

Should we use OJS, E-Prints or some other? Are they made for the same purpose?

Note: You should list the standard specifications that your platform should comply with and check what is the main purpose of what you want to do.  It is better to ask several people than use something that you do not need and later on do a huge work twice.

Do we have good quality information with one or more hosting companies/IT department of our institution?

Note: You should take into consideration that web applications developed for storage of a large number of files use a lot of space out of public_html folder and that some hosting companies/IT departments are not willing/prepared to create account for you that will store out of public_html 20GB of data and just 100MB within public_html.  Prepare list of questions to hosting company/IT department and carefully read their replies to your questions before making any decision on hosting your web platform for OA.

What kind of control panel hosting company/IT department will prepare for you?  Are you skilled and trained to use them?  Do they give you sufficient control?

Note:  Some control panels are not sufficiently known and many people do have difficulties using them. For example, I found that many admins did have difficulty setting up Sentora to give sufficient permissions for storing data outside of public_html safely. Or, hosting companies do not give you right to modify that part of your account.  That could be troublesome and you may need additional skills to handle that safely.

How many articles we plan to publish on line?  Do we have thousands of articles or just we want to start with issues we published last year?  Note:  You have to bear in mind that many systems make two or more copies of your files/articles in the system during review process. Consequently, 530MB of data that you have got will take much more space on the server.

Are those articles scanned or we have them as recently produced .pdf files or in some other format? Are the scans of good quality?  Can we read math, chemical or other scientific symbols? Are there articles with announcements and advertisements in the middle of text?  Do we have articles that are continued on some other pages?  Are their pieces put together?  Do we have available software to handle those files?

Do we want to enable our riders to pay subscription on line? Do we want to let authors to pay publication fee on line?  Do we have communication with our partner bank and clear instructions how to do that? Are there any costs for that? Is that defined in our legislation?  Can we provide invoice to authors that need to document their expenses to supporting agency?

Do we have people who are trained or willing to be trained to set up properly web application that we will use and to upload files and complete all necessary forms for metadata, about authors and contributors?  What time they will need to do that?

Note: Lessons drawn from experience often say that it is needed more time than we think at the beginning since it is very important to do the job properly. If metadata, authors names and affiliations etc. are not done properly the readers can be confused and misinformed which is contrary of what we want to do.

Are we able to pay those who will enter all data and upload files? Are they really able to volunteer and do a huge work properly?  Did we check our printed issues in the last couple of years to determine what information should be entered in web application? What are the names of sections?  Who are the section editors? Can we contact them? Do we want to write a list of instructions that will be given to those who will enter data? Do we have a person who will supervise and check accuracy of entered data?   Can we expect some legal actions if  some data is not entered properly?

Note: It is normal that journal in their history change  names, editorial policies, copyright policy, topics, supporters etc.  Sometimes journals do have different names for the same topic/section in journal. I have found several times that there is section Errata, Erratum, Corrigenda which actually describe the same thing or that publisher was changed.  Is there decision how to enter data in those cases?  Do we have consensus on that?

Do we want to have some other applications on account on server? What kind/version of infrastructure they require? Can we make that to work easily?  Do their requirements create conflicts with our open access publishing application?